postgresql scalability and mysql woe

general = { about, articles, links, projects }     meta = { date-posted: 2006-12-08 }

Every once in a while I run across an article that I find exciting. Recently, it was this article. The article is about the comparison of two different systems. A Sun UltraSparc T1 vs. a AMD Opteron.

The whole article was interesting enough, but I found myself especially drawn to their scalability comparison of postgresql with mysql. The results they got show that posgres scales much better than mysql. An interesting read. It sounds like the postgres team has put a good deal of work into making postgres scale well.


Shortly after reading that article, a colleague at work started doing some statistical analysis of some raw data. He was presented with a fairly large chunk of logs information to parse, and try to divine some performance and behavioral analysis from it.

He figured dumping it into a database would make it easier to work with. He parsed the logs, and dumped them all into a mysql table. This took quite a while. If I remember correctly he had something on the order of 2 or 3 GB of raw data that he pumped in, per data set. He had two data sets he was dealing with. He simply created a table for each data set, and was using raw sql to try and get meaningful results with.

The import took a while, which was expected. It takes a while to load in 3 million or so rows of data.

The problems started after he got the data imported. First, he couldn't get mysql to perform reasonably. He was running it on a dev box, which had a core duo and was, unfortunately, running a certain Redmond OS. He adjusted some mysql performance variables, and it seemed to run better. He got it to use more memory, which was what he wanted..it was a fairly beefy box.

During the course of his testing, I overheard an exclamation of woe. I came to find out that the DB had crashed, and munched his 4 GIGs of loaded data. When he tried to restart the DB, it wouldn't even start the process back up. It complained about a fatal error. After quite a bit of tinkering and troubleshooting, he eventually just reinstalled. Once he got the DB running again, he had to reimport the data...and wait..

At some point, he noticed that one of his columns was short some data. Instead of complaining when he imported a date with microseconds, it just munched the precision, and put in whole seconds. Let us just say that my colleague was not a happy camper.

I asked him if he was going to retry his testing with postgres, or perhaps retry with mysql running on a linux box. He said that he might. He did have a deadline to meet though, and at that point he was nearing completion of the process.

At a previous employer, we ran into some odd issues with mysql as well. I certainly would never use it for any data that I cared deeply about, and certainly nothing that could be considered 'business critical'.

Update

Cliff has a good article about this too. postgresql-stomps-mysql