michi bloggt!: 17 February 2006

Quite some time ago i provided a patch for Helma which allows finer-grained SQL Log File Analysis, by recording the time, the table, the type and the sql itself. Back then i quickly calculated some stats for twoday.net:
Type of DB-Requests: 98% Selects, 1,9% Updates, 0,1% Inserts, 0,01% Deletes
Used Tables: 53% AV_TEXT, 25% AV_SKIN, 14% AV_IMAGE, 8% others
40 SQL-Statements per second

Well regarding the distribution not much has changed. What has changed is, that we are now seeing about 200-300 SQLs per second, despite an object-cachesize of 120.000. So, we thought it was about time to take a closer look at the statements themselves. And this time, not the relative number of requests, but the relative amount of time spent was being looked at.

R> round(tapply(data[,3], data[,1], sum) / sum(data[,3]), 3)
SELECT_ACCESSNAMES         SELECT_ALL       SELECT_BYKEY  SELECT_BYRELATION         SELECT_IDS         SELECT_MAX    SELECT_PREFETCH 
            0.001              0.000              0.071              0.161              0.731              0.001              0.034

-> A surprisingly high amount of time is being spent on fetching ids, i.e. queries, that should already have been optimized by db-indices.

R> round(tapply(data[,3], data[,2], sum) / sum(data[,3]), 3)
   AV_CHOICE       AV_FILE      AV_IMAGE     AV_LAYOUT AV_MEMBERSHIP     AV_MODULE       AV_POLL       AV_SITE       AV_SKIN     AV_SYSLOG       AV_TEXT       AV_USER	AV_VOTE
       0.001         0.001         0.077         0.006         0.015         0.000         0.000         0.065         0.085         0.001         0.685         0.065	0.000 

R> round(tapply(data[,3], data[,1:2], sum) / sum(data[,3]), 3)
                   table
type                AV_CHOICE AV_FILE AV_IMAGE AV_LAYOUT AV_MEMBERSHIP AV_MODULE AV_POLL AV_SITE AV_SKIN AV_SYSLOG AV_TEXT AV_USER AV_VOTE
 SELECT_ACCESSNAMES        NA   0.000    0.001     0.000         0.000        NA      NA      NA   0.000        NA      NA      NA      NA
 SELECT_ALL                NA      NA       NA        NA            NA        NA      NA      NA   0.000        NA      NA      NA      NA
 SELECT_BYKEY               0      NA    0.007     0.006         0.010         0       0   0.003   0.000        NA   0.032   0.013      NA
 SELECT_BYRELATION         NA   0.001    0.046     0.000         0.003         0      NA   0.012   0.071        NA   0.001   0.027       0
 SELECT_IDS                 0   0.000    0.023     0.000         0.002        NA       0   0.049   0.014     0.001   0.617   0.025       0
 SELECT_MAX                NA      NA    0.000     0.000         0.000        NA      NA   0.000   0.000     0.000   0.001   0.000      NA
 SELECT_PREFETCH           NA      NA    0.000        NA            NA        NA      NA   0.000      NA        NA   0.034      NA      NA

-> So, the biggest part of sql-time is being used for fetching IDs from the AV_TEXT-table.
-> And suprisingly already on second place is the fetching of Skins, which is especially surprising since 75% of our users have never touched a single skin.
-> On fourth place we have the fetching of images via their name.

So, which are the overall top sql-statements regarding computation time?

R> sort(tapply(data[,3], data[,4], sum), dec=T)[1:30]
286699 ms	0.178	SELECT AV_TEXT.TEXT_TOPIC FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE  AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE > 0) GROUP BY TEXT_TOPIC ORDER BY TEXT_TOPIC"                                                                     
209549 ms	0.130	SELECT AV_TEXT.TEXT_DAY FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE  AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE ) GROUP BY TEXT_DAY ORDER BY TEXT_DAY desc"                                                                         
136830 ms	0.085	SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE  AND (TEXT_PROTOTYPE = Story AND TEXT_ISONLINE > 0) ORDER BY TEXT_CREATETIME DESC"                                                                                  
92191 ms	0.057	SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_TEXT_STORY  AND (TEXT_PROTOTYPE=Comment) ORDER BY TEXT_MODIFYTIME DESC"                                                                                                  
74457 ms	0.046	SELECT AV_SITE.SITE_ID FROM AV_SITE WHERE (SITE_ISONLINE > 0 AND SITE_ISBLOCKED  AND SITE_SHOW ) ORDER BY SITE_LASTUPDATE desc"                                                                   
63238 ms	0.039	SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE  AND (TEXT_ISONLINE > 0 AND (TEXT_PROTOTYPE = Story OR TEXT_PROTOTYPE = Comment)) ORDER BY TEXT_MODIFYTIME DESC"                                                    
56952 ms	0.035	SELECT AV_TEXT.TEXT_ID FROM AV_TEXT WHERE AV_TEXT.TEXT_F_SITE  AND AV_TEXT.TEXT_DAY = XXX AND (TEXT_PROTOTYPE = Story and TEXT_ISONLINE ) ORDER BY TEXT_CREATETIME desc"                                                          
49750 ms	0.031	SELECT AV_TEXT.* FROM AV_TEXT WHERE AV_TEXT.TEXT_ID "                                                                                                                                                                             
47139 ms	0.029	SELECT AV_IMAGE.* FROM AV_IMAGE  WHERE AV_IMAGE.IMAGE_ALIAS = III AND AV_IMAGE.IMAGE_F_SITE  AND (IMAGE_PROTOTYPE = Image and IMAGE_F_IMAGE_PARENT is null)"                                                                      
41664 ms	0.026	SELECT AV_USER.* FROM AV_USER  WHERE AV_USER.USER_NAME = UUU AND (USER_AUTH_TYPE = local AND USER_NAME IS NOT NULL)"

-> Oha! Big surprise (at least for me). 30% of the database time is used for determining the "dynamic" HopObjects Day and Topic, which are generated by grouping over all stories of a site. 30%! Well, this should be reason enough to normalize/separate them into distinct tables.
-> The second insight was, that we dont add layouts with no skins to the res.skinpath anymore (which saves us from lots of selects to the skin-table)
-> And thirdly, we might want to consider to directly insert the img-tags into a story, instead of the image-macros, as we already did for weblife. Your WYSIWYG-editor will also thank you.

michi - 17.Feb 2006 11:06 - technisches

JavaScript Libraries

SitePoint gives a comprehensive overview over some of the better-known JavaScript Libraries:

-> http://www.sitepoint.com/.../javascript-libraries-and-patterns-yahoo-does-ajax

on Dojo: "The Rolls Royce of JavaScript libraries."

on Prototype: "Because Prototype is so good at making low-level scripting less painful, a number of hihger-level libraries have been built with Prototype as a basis: Most notably: script.aculo.us and Rico"

on AjaxTK: "With the recent announcement that AjaxTK would be contributed as the foundation for Apache Kabuki, an open source AJAX toolkit, its future is looking brighter."

on Yahoo! UI Library: "If there is one thing to love about this library, it’s the documentation. From day one, every available component has full API documentation as well as a short “Getting Started” guide complete with working examples."

michi - 17.Feb 2006 10:45 - technisches

older stories