Me

No User

You must log in to access your account.

Options

World in Code

Just another Club Guge weblog

How abstract should my database be?

June 26th, 2008 in Uncategorized by admin

Yesterday, I saw the design of a new database. It was a result of some surprising design choices.

The database contains information about people and companies, some properties about each, and the relations between them. One of the main challenges when designing such a database is that companies and people have common properties. This is because both are persons. Companies are artificial persons and people are natural persons. Both share some legal properties. And then there are differences.

In object oriented languages you can make a class with the shared properties, and then subclass that class into a company class and a person class. The trouble is that SQL doesn’t have inheritance. This leaves you with three distinct choices:

  1. You can make one table for both companies and persons.
  2. You can make a company table and a person table.
  3. You can make a company table, a person table, and a third table with only the common fields.

Notice that each choice has the same number of tables as the ordinal number next to it. Each choice has a camel you have to swallow.

The one-table choice gives you a very wide table. The fields that are only relevant for companies will be blank for the persons and vice versa. And how often do you really want to retrieve both companies and persons with the same query?

If you go for two tables you have to use a union every time you want to retrieve something that could be company or persons or a mix. Actually this doesn’t look so bad, but read on. This is the least abstract solution.

If you go for three tables you will have to use a join every single time you want to get a person or a company. Sometimes you will have to use a union and two joins to retrieve data that were not put in the third table. On the other hand, retrieving data about both persons and companies is fast if they are in the third table.

The database also contains information about relationships between companies and persons. Some of these relationships are chairman and board members, CEO, owners, daughter companies, accountant and so on. Some relationships are between company and person, some are between company and company, some are between company and person or company. Some are singular, some are multiple.

CEO and chairman is a singular relationship. A company cannot have more than one CEO and chairman. The company table could have two fields referencing these persons. How about the mixed multiple relationships, such as owners. Companies can have many or few owners and these ownere may be companies or persons or both. If you have decided on the two table choice you will find that it is hard to create the relationship.

Should you create one relationship table for owners that are persons and another for owners that are companies? You will have to if you want to use foreign key constraints. When retrieving the list of owners you will now have to do a union between two queries with two joins each. Suddenly the two table choice isn’t so nice.

If you go for the one or three table option, you get away with one relationship table for owners. The one table option doesn’t let you constrain the CEO and chairman field. Your database could even store a company as chairman of a person without breaking any constraints. Maybe the three table option is the best after all?

Or should all the realtionships be stored in a single relationship table, with a type of relationship as a field. It could, but now you get a lot of joins of the same table when you need to retrieve a company with names of CEO, chairman and board members. If you have a single relationship table, and the two table option, you can forget about constraints between the relationships and the objects. Bad thing!

One argument often used for the single relationship table is that new types of relationship may have to be implemented, and the single relationship table has flexibility to allow this. This is true in some way, but a new relationship type will have to be implemented other places anyway. Usually in all layers of your architecture. Having to create one more table for the new relationship type is not a real problem. Having to wade through all your existing stored procedures to find out if your new type in your single table will make a mess can be a problem.

My observation is that there is a bottomless chasm between object oriented thinking on one hand and relational databases on the other hand. No matter how you design your database you will have a steady diet of camels to swallow. Use inheritance, try to map it into a relation database system and you will find yourself making concessions with your principles left and right.

I try to figure out what I want to retrieve from my database, and then what I need to store in it. In the case above I would have gone for the two table option. When it comes to the relationship tables, I would have created one table for each kind of relation. But for mixed relations, like ownership, I would have let the foreign key constraints go. I like my database best when they are not so abstract. Ownership in the ownership table, company in the company table and so on. If I need to retrieve several types of relationships in the same select I can always use a union. I think the queries are easy to understand this way.

The design I saw had gone the other way. It had a one table option combined with a singular relationship table with types. Even for the CEO and Chairman relationships. The architect had preferred other camels than I would. This database design is a bit more abstract than I would have made it.

Many years ago I had a look at Microsoft CMS server. It had a database design with one table in the middle. This table had all the content with plenty of references to itself. The object model in the API was very hierarchical. The main table was surrounded by some supporting tables such as tables containing valid values for type fields, user authorizations and so on. It was very abstract. It led me to think that any object model can be abstracted into 3-4 tables. One hierarchically selfreferencing table for the data, one for many-to-many relationships, one for valid values and one for interpretation help.

Then again, databases are usually stored as files on disks. These files can be represented in a table in a database. So the ultimate abstraction is to just store everything in a database in a single table containing chunks of binary data together with an ordinal number so everything is in sequence. No architect in a sound state of mind would suggest something like this, but it could be done.

 


This entry was posted on Thursday, June 26th, 2008 at 5:03 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “How abstract should my database be?”

  1. BG said:

    Mmmrfh… guess it will be camel-hard to make changes within the database once it`s pumped with binarities anyway.

    The best you can do, is to pray for a Oracle inovation (shall we say, …within some decades?)



Leave a Reply