Caching In On Performance Boosts
Posted by Kurt on Tuesday, June 16, 2009 in category:
Part of my intent in making this blog was to be able to talk about ongoing development in my various projects. Unfortunately, a lot of the work I typically do is simple HTML and CSS and it’s not challenging me to learn something new. Today, however, I have something to report.
I received an email from Dreamhost last week informing me that they were throttling connections to my video hosting site because recent traffic spikes from the Sonic Unleashed Let’s Play were causing the Apache server to have problems. Now I’ll be the first to admit that my video hosting solution is not elegant or efficient. I started it small and hacked on features as I went. Yeah, one of those. Everyone makes one eventually, but in this case, mine got popular so I needed to fix it.
I determined that I needed to implement a form of caching to cause my spaghetti code to run as infrequently as possible. Professional-level caching solutions require control of the server the site is running on, both hardware and software. These same solutions are also rather complicated and work to cache selectively only those parts of the site (and sometimes within individual pages) that do not change very often or ever. For my site, I know that once a video is added, the page that lets you view it almost never changes, so this is the part of the site that will be cached. This is also the part of the site that sees ~80% of the traffic total and closer to 100% of the traffic during the spikes. Fantastic.
The rewrite involved a few steps. First I removed all the code used to draw the page contents and put it in a function. The function, printVideoPage, takes the ID of the video in the database and returns a string of the page content. I also took this opportunity to optimize the way I use ADOdb because I have a better understanding of it now than when I wrote this 3 years ago.
The second function I wrote creates a cache file. This function, updateCache, also takes an ID. It opens the cache file located at the configured location with the configured file name and starts writing. It actually writes a PHP file which declares two variables, $name and $contents. When show.php includes the cache file for the requested ID, it can put the name of the video in the title and the contents where they belong.
For the average visitor to my site, this eliminates all SQL calls and a great deal of very bad PHP. If a cache file does not exist, the site displays an error. This means that the cache must absolutely be a correct and accurate representation of the database at all times which annoys me, but as long as I call updateCache any time I change or add a video record in the database, everything should be fine. I do not think there are any bugs, but there are certain aspects which have not yet been tested as much as they should be. show.php is functioning as expected which for now is the most important thing.
I cannot show you empirical evidence that my measures to reduce load have worked. I’m on shared hosting; I don’t get access to that kind of stuff. I can only hope it has made a difference and that Dreamhost sees that and unthrottles me. If that’s not good enough for them (and it really should be, Sonic 2006 had way more traffic and this didn’t happen), I am at the end of my bag of tricks on the current codebase and I will have to rewrite. Ughh.