Skip to content Skip to sidebar Skip to footer

Rails Activerecord And Db Normalization

What are the pros and cons of breaking out optional 1:1 attributes into their own separate model. For example, I just encountered Rails code like this: class Dogs << ActiveRe

Solution 1:

Including all the records in one table is likely the preferential way to go. It gives you better performance at the cost of size.

The performance is arguable depending on how the tables are actually used. If the systems expensive queries doesn't require spots data (like a mass listing), it could potentially be faster not having them on the table, but if you kept the data on table tailoring your query to only pull in the needed data would eliminate the cost of the query almost entirely.

I've seen the 1:1 releationship implemented because it makes more sense to a developer than any logical difference in code. Left Outer Joins are more expensive than Inner Joins and Joins are more expensive than no joins.

You could potentially decrease the overall database size by not keeping this information on table, but for most small applications (under 1 TB) you should be optimizing for performance.

The only real situation that I would think that this set up would be ideal is if you wanted to constrain the type of spots a user can choose from or if the you did a lot of processing on these elements of a dog separately.

You can get the same constraining feel by using a lookup table of all possible values and only allowed the user to select from values in the lookup table. Then go back and insert raw values in the dog table. This does does have the side effect of potentially polluting the dog table with data not properly scrubbed by the web app, but it would be a quick effort to isolate what data is not inside of the constraints by comparing to a spot look up table and correcting the issue in the web app.

Solution 2:

1:1 relationships are frequently decomposed when the relationship is of the "Is-A" type rather than of the "Has-A" type. In the extended ER model this is called "specialization". In the world of SQL tables, this sometimes goes by the name "Class Table Inheritance". You can look either of these terms up for a pretty good treatment of the corresponding subject.

Class Table Inheritance contrasts with "Single Table Inheritance" which results in a single table with NULLS in places where a value would be irrelevant. This looks like your choice for Dogs and Spots.

In the world of "Cars", a specialization might be a table for "Vehicles", one for "Autos" and one for "Trucks". ("Lorries" are called "Trucks" on my side of the pond.) Autos and Trucks are specialized types of Vehicles. The attributes that are stored in trucks and not in vehicles are attributes that are irrelevant to vehicles that are not trucks.

The purpose is generally not to improve performance, but to improve the form of queries. Queries that are about truck data only can query the trucks table. Queries that are about vehicle data only can query the vehicles table. And queries that involve data about both vehicles and trucks can query a view that joins vehicles and trucks over a common column.

Contrary to what others have opined, narrow tables do outperform wide tables, although the effect is minor compared to indexing and joins.

You can combine Class Table Inheritance with a technique called "Shared Primary Key", which you can look up. You get very fast, and very easy joins compared to other ways of relating the tables to each other. Shared Primary keys involve more work at insert time, because you have to propagate the common value from the generalized table to the appropriate specialized table under program control.

If I were doing your case, I would look for ways to exploit specialization around the various fuel types you mentioned. Some of your queries might end up scanning only one of five specialized tables (aka "subclass tables") thereby running five times as fast.

Solution 3:

The former approach is better any day then the latter one if you have a huge amount of databases.

It is always a plus point to separate the things in a separate table in which we are not sure the value will come or not. like in your case dog and dog spot if only 20 % of dogs have spots then their is no need of keeping it in the same database table as 80 % of the columns will be nill and while retrieving data from the main table a lot of data will be picked which will surely hamper the server performance. If the database is small then perhaps one could do not neccesarily make a second table

If we are worried of the time complexity of joins and all then we can apply indexing which will further decrease our cost.

so the main point

1) if database contains many records apply former technique plus do some indexing to save the server cost

2) If database is small u can go for the latter one

Solution 4:

Pros of de-normalization:

  • Each table takes up memory, less tables means less overhead (once I was dealing with a Database with 30K tables with a total of 100 meg of data, and the database memory usage was enormous)

  • Manual Analysis can be done easily, i.e export 1 table to Excel, Google Spreadsheets or whatever and you use filters

Cons

  • Lots of columns with null values.

Another approach to balance lots of columns vs extra attribute tables is to use serialized arrays for elements that are less common, like iPod interation, and then use a Search server (elasticsearch, websolr and so on), which can handle the queries with full text... This would allow you to add new elements to Cars table without new columns. Like Self Driving Car, Self Parking Car.

Post a Comment for "Rails Activerecord And Db Normalization"