July 4th, 2008 in Uncategorized by admin
Again, I have had the misfortune of having to work with an Oracle database. Again, I am amazed at how incredibly bad the product is. Please, let me rant on about this.
I install the Oracle 10g client. The installation file is 483MB! I choose the administrator version. I get a folder with 51 subfolders! It’s just a client! It doesn’t use the regular installer, oh no. Oracle needs its own installer.
How does Oracle do name resolution? It has it’s own, of course. Now begins the fun. The DBA sends me sqlnet.ora and tnsnames.ora. And I try to connect. No luck. TNSPING is ok, but any other tool, like SQLPLUS fails. Then they want me to try other sqlnet.ora and tnsnames.ora files. Still nothing. Then some more. Now I can connect to the 32bit Oracle server, but not to the 64bit Oracle server. So they tell me it’s because I run Vista, or it’s the firewall, or it’s because it’s a full moon. Then they want me to uninstall the client. This is done via the Oracle Universal Installer. I start it. I get a console saying it’s preparing the launch of the installer. Then nothing, except my system is running real slow, the mouse is practically creaking as it tries to move accross the screen. Oh, yes, I must be running Oracle software. Then it crashes, of course. No universal installer for me. Should I just delete c:\oracle? No, because my registry is filled with Oracle stuff. The DBA tells me to choose XP over Vista if I’m going to reinstall the OS.
How can it be that I can connect to 32bit but not 64bit servers? What kind of would idiot develop different protocols for 32bit and 64bit servers. Surely, the client doesn’t have to even know if it’s one or the other. What good is TNSPING if it says OK when SQLPLUS says protocol failure? Who in his right mind develops a name resolution system that uses files on the client when DNS is ubiquitous? Why does a client need 51 folders?
These are all interesting questions to go into? They all have a root explanation which is that Oracle sucks. When I say that Oracle sucks, I’m not really saying that it actively ingests the atmosphere. It’s just that everything else has higher pressure.
Another, and more interesting, question is how can Oracle as a company keep on earning money if its products are so bad? Shouldn’t the market rid itself of something so bad?
I guess noone ever got fired for choosing Oracle. It still has a good name in the market. It runs some of the biggest databases in the world. It has clustering technology, scalability, multi platformability and so on. It also has some of the most expensive consultants.
It is often said that Oracle performs better than the competing systems. If you install an Oracle server you get a standard configuration that for most situations is sub-optimal. Oracle servers have to be configured by hand. Someone has to figure out what the shared pool, buffer size, block size, java pool, streams pool and lots of other settings should be. Yes, Oracle can probably give good performance, but only if you can afford someone to configure it for you. On the other hand, there are systems that give good performance right out of the box.
Oracle is more scalable. That is: you can run it on large and expensive servers. Would you want to? Most of us wouldn’t. So does it matter? On the other hand: it doesn’t scale down very well. You wouldn’t want it on a client computer or a handheld device. It runs on many platforms. Well, I’ll bet lunch that MySQL runs on more platforms. Postgres too.
Being a developer isn’t always fun. We have our favourite tools. My favourite database tools are the ones that come with Microsofts SQL Server. Why is it so hard to get acceptance for running a big database on SQL Server? Maybe most of the decision makers are old, over the hill, still thinks SQL Server is version 6.5. Which wasn’t very good. I think SQL Server has been the superior product since version 7.0. The development tools are better, the administration software is smoother, the integration with the OS is better. I’d rather run MySQL than Oracle. But I’ll take SQL Server over them both.
My generation will not stand for the Oracle way of doing things, and we are getting into positions. Enjoy your yachts while you can, Larry!
June 26th, 2008 in Uncategorized by admin
Yesterday, I saw the design of a new database. It was a result of some surprising design choices.
The database contains information about people and companies, some properties about each, and the relations between them. One of the main challenges when designing such a database is that companies and people have common properties. This is because both are persons. Companies are artificial persons and people are natural persons. Both share some legal properties. And then there are differences.
In object oriented languages you can make a class with the shared properties, and then subclass that class into a company class and a person class. The trouble is that SQL doesn’t have inheritance. This leaves you with three distinct choices:
- You can make one table for both companies and persons.
- You can make a company table and a person table.
- You can make a company table, a person table, and a third table with only the common fields.
Notice that each choice has the same number of tables as the ordinal number next to it. Each choice has a camel you have to swallow.
The one-table choice gives you a very wide table. The fields that are only relevant for companies will be blank for the persons and vice versa. And how often do you really want to retrieve both companies and persons with the same query?
If you go for two tables you have to use a union every time you want to retrieve something that could be company or persons or a mix. Actually this doesn’t look so bad, but read on. This is the least abstract solution.
If you go for three tables you will have to use a join every single time you want to get a person or a company. Sometimes you will have to use a union and two joins to retrieve data that were not put in the third table. On the other hand, retrieving data about both persons and companies is fast if they are in the third table.
The database also contains information about relationships between companies and persons. Some of these relationships are chairman and board members, CEO, owners, daughter companies, accountant and so on. Some relationships are between company and person, some are between company and company, some are between company and person or company. Some are singular, some are multiple.
CEO and chairman is a singular relationship. A company cannot have more than one CEO and chairman. The company table could have two fields referencing these persons. How about the mixed multiple relationships, such as owners. Companies can have many or few owners and these ownere may be companies or persons or both. If you have decided on the two table choice you will find that it is hard to create the relationship.
Should you create one relationship table for owners that are persons and another for owners that are companies? You will have to if you want to use foreign key constraints. When retrieving the list of owners you will now have to do a union between two queries with two joins each. Suddenly the two table choice isn’t so nice.
If you go for the one or three table option, you get away with one relationship table for owners. The one table option doesn’t let you constrain the CEO and chairman field. Your database could even store a company as chairman of a person without breaking any constraints. Maybe the three table option is the best after all?
Or should all the realtionships be stored in a single relationship table, with a type of relationship as a field. It could, but now you get a lot of joins of the same table when you need to retrieve a company with names of CEO, chairman and board members. If you have a single relationship table, and the two table option, you can forget about constraints between the relationships and the objects. Bad thing!
One argument often used for the single relationship table is that new types of relationship may have to be implemented, and the single relationship table has flexibility to allow this. This is true in some way, but a new relationship type will have to be implemented other places anyway. Usually in all layers of your architecture. Having to create one more table for the new relationship type is not a real problem. Having to wade through all your existing stored procedures to find out if your new type in your single table will make a mess can be a problem.
My observation is that there is a bottomless chasm between object oriented thinking on one hand and relational databases on the other hand. No matter how you design your database you will have a steady diet of camels to swallow. Use inheritance, try to map it into a relation database system and you will find yourself making concessions with your principles left and right.
I try to figure out what I want to retrieve from my database, and then what I need to store in it. In the case above I would have gone for the two table option. When it comes to the relationship tables, I would have created one table for each kind of relation. But for mixed relations, like ownership, I would have let the foreign key constraints go. I like my database best when they are not so abstract. Ownership in the ownership table, company in the company table and so on. If I need to retrieve several types of relationships in the same select I can always use a union. I think the queries are easy to understand this way.
The design I saw had gone the other way. It had a one table option combined with a singular relationship table with types. Even for the CEO and Chairman relationships. The architect had preferred other camels than I would. This database design is a bit more abstract than I would have made it.
Many years ago I had a look at Microsoft CMS server. It had a database design with one table in the middle. This table had all the content with plenty of references to itself. The object model in the API was very hierarchical. The main table was surrounded by some supporting tables such as tables containing valid values for type fields, user authorizations and so on. It was very abstract. It led me to think that any object model can be abstracted into 3-4 tables. One hierarchically selfreferencing table for the data, one for many-to-many relationships, one for valid values and one for interpretation help.
Then again, databases are usually stored as files on disks. These files can be represented in a table in a database. So the ultimate abstraction is to just store everything in a database in a single table containing chunks of binary data together with an ordinal number so everything is in sequence. No architect in a sound state of mind would suggest something like this, but it could be done.
June 18th, 2008 in Uncategorized by admin
I am currently reading one of Fridthjof Nansen’s books. The book starts with a speech he gave as an honorary rector at a Scottish university. He describes how he conquered the inland ice of Greenland by not having a line of retreat, but instead by burning his bridges and to only have the goal at mind.
Some people had tried to cross Greenland before him, but they started from the civilization on the west coast and headed for the void of the east coast, where they planned to be retrieved by boat. As they ventured far into the ice, they were increasingly tempted by the prospect of returning to the civilized west coast.
If you, on the other hand, begin at the east coast, dropped by a ship that has already left you, all the temptations are on the west side. There is no line of retreat to fall back to. There is nothing tempting you to look back.
How does this apply to coding? Maybe source control isn’t such a hot idea? Maybe it’s easier to rewrite than to dig into old code. I have found that sometimes its easier to understand a problem and thus formulate this problem in code than it is to understand buggy code that should solve the problem.
Let’s say you have a 1000 lines of code supposed to solve a given problem, but it doesn’t work. It seems to be almost there, but it still doesnt work. You understand the problem, but the code is broken so the code can’t really be understood. Somewhere in the process of coding, the coder (maybe even you) took a wrong turn and got lost. What is easiest? Turn back to that place and try to find the way from there, or start all over and head for the goal?
The temptation is that maybe you can change only one line or two and then get it to work. The trouble is that you have to understand all the other lines. I’d rather write 1000 lines fresh than to try to understand why 1000 lines are broken and then fix them.
I’m sure Nansen would agree.