Generally, Are String (or Varchar) Fields Used As Join Fields?
Solution 1:
It's certainly possible to use a varchar as a key field (or simply something to join on). The main problems with it are based on what you normally store in a varchar field; mutable data. Strictly speaking, it's not advisable to have key fields change. A person's name, telephone number, even their SSN can all change. However, the employee with internal ID 3 will always be ID 3, even if there are two John Smiths.
Second, string comparison is dependent on a number of nit-picky details, such as culture, collation, whitespace translation, etc. that can break a join for no immediately-apparent reason. Say you use a tabspace character \t for a certain string you're joining on. Later, you change your software to replace \t with 3 spaces to reduce character escapes in your raw strings. You have now broken any functionality requiring a string with escaped tabs to be matched to an identical-looking, but differently-composed, string.
Lastly, even given two perfectly identical strings, there is a slight performance benefit to comparing two integer numbers than comparing two strings. Integer comparison is effectively constant-time. String comparison is linear at best, based on the length of the string.
Solution 2:
Is it generally discouraged to use a varchar/string field as a join between two tables?
If there's a natural key to be used (extremely rare in real life, but state/province abbreviations are a good example), then VARCHAR fields are fine.
When is the best case where a string field can be used as a join field?
Depends on the database because of the bits allocated to the data type, but generally VARCHAR(4) or less takes around the same amount of space (less the less number of characters) as INT would.
Solution 3:
Generally speaking you shouldn't use anything that is editable by the end users as a FK as an edit would require not one update, but one update per table which references that key.
Everyone else has already mentioned the potenetial performance implications of a query, but the update cost is also worth noting. I strongly suggest the use of a generated key instead.
Solution 4:
If you're concerned about performance, the best way to know is to create tables that implement your potential design choices, then load them up with massive amounts of data to see what happens.
In theory, very small strings should perform as well as a number in joins. In practice, it would definitely depend upon the database, indexing, and other implementation choices.
Solution 5:
In a relational database, you shouldn't use a string in one table that references the same string in another table. If the second table is a look-up, create an identity column for the table, and then reference the integer value in the first. When displaying the data, use a join to the second table. Just make sure in the second table you never actually delete records.
The only exception would be if you are creating an archive table where you want to store exactly what was chosen at a given time.
Post a Comment for "Generally, Are String (or Varchar) Fields Used As Join Fields?"