SQL Log File Analysis
Quite some time ago i provided a patch for Helma which allows finer-grained SQL Log File Analysis, by recording the time, the table, the type and the sql itself. Back then i quickly calculated some stats for twoday.net:
Type of DB-Requests: 98% Selects, 1,9% Updates, 0,1% Inserts, 0,01% Deletes
Used Tables: 53% AV_TEXT, 25% AV_SKIN, 14% AV_IMAGE, 8% others
40 SQL-Statements per second
Well regarding the distribution not much has changed. What has changed is, that we are now seeing about 200-300 SQLs per second, despite an object-cachesize of 120.000. So, we thought it was about time to take a closer look at the statements themselves. And this time, not the relative number of requests, but the relative amount of time spent was being looked at.
-> And suprisingly already on second place is the fetching of Skins, which is especially surprising since 75% of our users have never touched a single skin.
-> On fourth place we have the fetching of images via their name.
So, which are the overall top sql-statements regarding computation time?
-> The second insight was, that we dont add layouts with no skins to the res.skinpath anymore (which saves us from lots of selects to the skin-table)
-> And thirdly, we might want to consider to directly insert the img-tags into a story, instead of the image-macros, as we already did for weblife. Your WYSIWYG-editor will also thank you.
Type of DB-Requests: 98% Selects, 1,9% Updates, 0,1% Inserts, 0,01% Deletes
Used Tables: 53% AV_TEXT, 25% AV_SKIN, 14% AV_IMAGE, 8% others
40 SQL-Statements per second
Well regarding the distribution not much has changed. What has changed is, that we are now seeing about 200-300 SQLs per second, despite an object-cachesize of 120.000. So, we thought it was about time to take a closer look at the statements themselves. And this time, not the relative number of requests, but the relative amount of time spent was being looked at.
R> round(tapply(data[,3], data[,1], sum) / sum(data[,3]), 3) SELECT_ACCESSNAMES SELECT_ALL SELECT_BYKEY SELECT_BYRELATION SELECT_IDS SELECT_MAX SELECT_PREFETCH 0.001 0.000 0.071 0.161 0.731 0.001 0.034-> A surprisingly high amount of time is being spent on fetching ids, i.e. queries, that should already have been optimized by db-indices.
R> round(tapply(data[,3], data[,2], sum) / sum(data[,3]), 3) AV_CHOICE AV_FILE AV_IMAGE AV_LAYOUT AV_MEMBERSHIP AV_MODULE AV_POLL AV_SITE AV_SKIN AV_SYSLOG AV_TEXT AV_USER AV_VOTE 0.001 0.001 0.077 0.006 0.015 0.000 0.000 0.065 0.085 0.001 0.685 0.065 0.000 R> round(tapply(data[,3], data[,1:2], sum) / sum(data[,3]), 3) table type AV_CHOICE AV_FILE AV_IMAGE AV_LAYOUT AV_MEMBERSHIP AV_MODULE AV_POLL AV_SITE AV_SKIN AV_SYSLOG AV_TEXT AV_USER AV_VOTE SELECT_ACCESSNAMES NA 0.000 0.001 0.000 0.000 NA NA NA 0.000 NA NA NA NA SELECT_ALL NA NA NA NA NA NA NA NA 0.000 NA NA NA NA SELECT_BYKEY 0 NA 0.007 0.006 0.010 0 0 0.003 0.000 NA 0.032 0.013 NA SELECT_BYRELATION NA 0.001 0.046 0.000 0.003 0 NA 0.012 0.071 NA 0.001 0.027 0 SELECT_IDS 0 0.000 0.023 0.000 0.002 NA 0 0.049 0.014 0.001 0.617 0.025 0 SELECT_MAX NA NA 0.000 0.000 0.000 NA NA 0.000 0.000 0.000 0.001 0.000 NA SELECT_PREFETCH NA NA 0.000 NA NA NA NA 0.000 NA NA 0.034 NA NA-> So, the biggest part of sql-time is being used for fetching IDs from the AV_TEXT-table.
-> And suprisingly already on second place is the fetching of Skins, which is especially surprising since 75% of our users have never touched a single skin.
-> On fourth place we have the fetching of images via their name.
So, which are the overall top sql-statements regarding computation time?
R> sort(tapply(data[,3], data[,4], sum), dec=T)[1:30] 286699 ms 0.178 SELECT AV_TEXT.TEXT_TOPIC FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE > 0) GROUP BY TEXT_TOPIC ORDER BY TEXT_TOPIC" 209549 ms 0.130 SELECT AV_TEXT.TEXT_DAY FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE ) GROUP BY TEXT_DAY ORDER BY TEXT_DAY desc" 136830 ms 0.085 SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE AND (TEXT_PROTOTYPE = Story AND TEXT_ISONLINE > 0) ORDER BY TEXT_CREATETIME DESC" 92191 ms 0.057 SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_TEXT_STORY AND (TEXT_PROTOTYPE=Comment) ORDER BY TEXT_MODIFYTIME DESC" 74457 ms 0.046 SELECT AV_SITE.SITE_ID FROM AV_SITE WHERE (SITE_ISONLINE > 0 AND SITE_ISBLOCKED AND SITE_SHOW ) ORDER BY SITE_LASTUPDATE desc" 63238 ms 0.039 SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE AND (TEXT_ISONLINE > 0 AND (TEXT_PROTOTYPE = Story OR TEXT_PROTOTYPE = Comment)) ORDER BY TEXT_MODIFYTIME DESC" 56952 ms 0.035 SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE AND AV_TEXT.TEXT_DAY = XXX AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE ) ORDER BY TEXT_CREATETIME desc" 49750 ms 0.031 SELECT AV_TEXT.* FROM AV_TEXT WHERE AV_TEXT.TEXT_ID " 47139 ms 0.029 SELECT AV_IMAGE.* FROM AV_IMAGE WHERE AV_IMAGE.IMAGE_ALIAS = III AND AV_IMAGE.IMAGE_F_SITE AND (IMAGE_PROTOTYPE = Image and IMAGE_F_IMAGE_PARENT is null)" 41664 ms 0.026 SELECT AV_USER.* FROM AV_USER WHERE AV_USER.USER_NAME = UUU AND (USER_AUTH_TYPE = local AND USER_NAME IS NOT NULL)"-> Oha! Big surprise (at least for me). 30% of the database time is used for determining the "dynamic" HopObjects Day and Topic, which are generated by grouping over all stories of a site. 30%! Well, this should be reason enough to normalize/separate them into distinct tables.
-> The second insight was, that we dont add layouts with no skins to the res.skinpath anymore (which saves us from lots of selects to the skin-table)
-> And thirdly, we might want to consider to directly insert the img-tags into a story, instead of the image-macros, as we already did for weblife. Your WYSIWYG-editor will also thank you.
michi - 17.Feb 2006 11:06 - technisches