Friday, 14. April 2006

MySQL Troubles at twoday.net

To say the least, i've gone through hell the last couple of days :-)
It started monday night, when i once again tried to nail down the performance troubles, which we experienced here at twoday.net. Performance tuning requires knowledge and experience in so many different areas (hardware, debian, mysql-db, helma, twoday), that this task is still quite often attached to me within the company.

So, Monday night I could quickly identify with the help of helma's sql files, that the troubles solely reside at the database-side, and that all other troubles (the infamous "maximum thread count reached"-message) were just symptoms of a slow database-server. I started out with the assumption, that the db-indexes were not working correct anymore, since some of the standard sql-statements took an insane amount of time (~8sec). So i tried to rebuild these indexes, to add some new and remove some unused ones. An iterative task, that takes hours with large databases. Somewhere around 4a.m. i got the deceiving impression, that everything was working fine again, and i went to sleep.

Well, just for a couple of hours, since I was awakened by alerts that twoday.net is extremly slow again, and that it is even worse than before. That everything worked at 4a.m. was simply because there is not much traffic at that hour. Panic! Since we switched a while ago to MySQL 5 (from dotdeb.org), i started blaming this move. So i switched to all kind of MySQL 5-binaries from the MySQL-website. I started switching back to MySQL 4.0. Nothing helped at all, the sql-statements still took far too long. And it wasn't just a certain kind of statement, even a simple SELECT count(*) from AV_TEXT took a second. SHOW PROCESSLIST showed that most of the connections were in "locked"-state, i.e. waiting to be processed, and that around 3-4 statements were actually being processed, but taking couple of seconds. As I mentioned a while ago, we have about 200-300 SQL-statements per second, so this meant troubles for sure.

Switching database-versions obviously didn't help, so i started to blame the hardware. I completely moved the database to another server. It didn't help either. Same symptoms. More Panic!

After I read through tons of MySQL-pages (it was already getting night again), I tried fiddling around with all kind of Server System Variables. It all had no effect at all, nada.

Finally I gave in, and started setting up a mysql-cluster, in order to evenly distribute the queries to two servers. Again it was 4a.m., my mind was not working anymore, and i went to sleep. Just to be awakend couple of hours later, to hear that it didn't help either. The cluster-setup was fine, but the queries still took 4secs instead of 0.04secs, so i would have needed a hundred servers.

What really shocked me was, that i was not able to bring back a twoday.net-setup that worked at least slowly. Whatever i tried, ended up in "maximum thread count reached", and lots of unhappy users, who i can fully understand. Generally speaking I am just not the type of guy, who gives up easily. Hey, i finished 6 marathons until now, and they were all not a piece of cake. But this time I gave in.

I went back to the office wednesday morning, with absolutely no idea of what to do next. I started telling matthias and axel the full story. And while telling them everything chronologically, and with them just asking the right questions, and with a sudden common inspiration, it all became obvious! The twoday-database is growing and growing, and now is about 1.7GB big, with the single AV_TEXT-table accouting for most of this space (btw wikipedia's content is not much bigger). Hmm, so we've got 2GB RAM at the db-server (and also on the other machine i tried out). About 300KB of these are used for key-caching, and that's why i always assumed that we have plenty of RAM for the database. But, what is hardly mentioned anywhere, not even in the otherwise excellent O'Reilly book "High Performance SQL", is that it makes a HUGE difference whether the OS is able to cache your data-files or not. With Oracle (according to Axel) this kind of disk-caching is handled by the db-server. With MySQL it is left to the operating system! Therefore you will never find any configuration parameter to adjust this, resp. no mysql status variable indicates the fact, that the OS can't cache your files anymore! I was just never aware of that the full database has always been completely in memory, and as soon as this is not possible anymore, there is just no chance to handle 300 queries per second. And honestly, I blame MySQL (or Zawodny, author of the forementioned book) for not making this point clearer.

Solution 1: Make your database smaller!
Solution 2: Order more RAM!
Solution 3: Start thinking about Partitioning (a new feature in Mysql 5.1)

@1: Thanks to Axel, who gave me the decisive hint, it was damn easy to cut a twoday-database nearly to half of its size. Simply drop the TEXT_RAWCONTENT-column, and perform the search on the TEXT_TEXT-column. That is the reason, why everthing is running so smoothly here at twoday.net again.
@2: Easy solution, but in our case, the RAM will not arrive before middle of next week. So, we have to wait for that, and hope solution 1 is good enough until then.
@3: Next Wednesday there is a (free) web presentation regarding Partitioning, which might be interesting: see here.

So, to sum things up, (painful) lessons lerned from this nightmare:
* Alyways keep your database in memory!
* Start acting more like a team-player than a marathon-runner! :-)

Search

 

About michi

michi Michi a.k.a. 'Michael Platzer' is one of the Knallgraus, a Vienna-based New Media Agency, that deals more and more with 'stuff' that is commonly termed as Social Software.

Meet my fellow bloggers at Planet Knallgrau.

my delicious

Recent Updates

My Gadgets

Credits

Knallgrau New Media Solutions - Web Agentur f�r neue Medien

powered by Antville powered by Helma


Creative Commons License

xml version of this page
xml version of this page (summary)

twoday.net AGB

Counter



berufliches
blogosphaerisches
privates
spassiges
sportliches
technisches
trauriges
Profil
Logout
Subscribe Weblog