So, yeah, I was Dugg a few weeks ago, and like many other sites, mine crashed and burned. I'm on shared hosting and my site was consuming 90%+ of system resources, so EasyCGI had to shut me down to prevent interruption of service to their other users. They suggested that I figure out a way to reduce server load should this continue to happen.
Here's my setup: PHP/MySQL/IIS. [Aside: Why IIS? Because I want to be able to use both ASP/ASP.NET when I feel like it as well as PHP. I use MySQL because that's what came free with my hosting.] This is probably not a typical setup, but it doesn't matter: the concept I describe below should work for PHP/ASP, MySQL/MS-SQL, etc. Friendly URLs are apparently easier to do with Apache, but I don't know about caching, and in this example, the two go hand-in-hand.
I want my site to be Digg-proof. Well, I may not ever get dugg again, but it's certainly worth a little effort to try and reduce server load. How about some sort of caching? That way the PHP and MySQL calls are only run one time, and a static page is used from then on. A static page certainly takes less server effort that a PHP page with MySQL calls.
While I'm at it, why not toss in "friendly URLs" (or "cool URLs") as well? That is to say, "http://www.mgroves.com/I-like-balloons" is a friendly URL, where "http://www.mgroves.com/blog_archive.php?blogID=123" is not. For an example, check out the URLs at Wikipedia. They look better, describe the link better, and I've heard they're better for search engine indexing and what not.
For some people, these two things might be as easy as changing a couple of server setting and bammo, it's done. For me, not so much. So I thought about it a little bit and I came up with a plan and a step-by-step description of the process I wanted to implement when someone requests a page from mgroves.com:
- Receive request for a page (let's say, http://www.mgroves.com/I-like-balloons)
- Find out if the page is already cached.
- Find out if the page more that 6 hours old.
- If it's cached and 'young', redirect to it. Done.
- Otherwise, make sure it's a page that should exist. If not, 404. Done.
- Delete the old cached page (if it exists).
- Dynamically generate a young static version of the page they want.
- Redirect to it. Done.
Seems pretty straightforward. Notice that in most cases, the request will never even reach a point where SQL queries or lengthy PHP scripts need to be run. Here are the tools we'll need:
- A "cache" subfolder with read/write/delete permission.
- Some way to grab whatever URL the user is passing, even if it's not a 'real' destination.
- Some way to tell how old a cached page is.
- Some way to create/delete static pages based on dynamic content from a database.
If the user types in "http://www.mgroves.com/I-like-balloons", they would normally get the default 404 error. But what if the 404 error page was a PHP script itself? Then we could grab the querystring server variable and process from that point on. Easy enough. I created a handle404.php PHP script and set it as the 404 page using EasyCGI's hosting tools (hopefully your hosting service has similar functionality). Here's how I get the "I-like-balloons" part from the querystring:
$aQueryString = explode("/",$_SERVER['QUERY_STRING']);
$strFriendlyName = strtolower($aQueryString[count($aQueryString)-1]);
$strPath = "cache/" . $strFriendlyName . ".html";
The ASP equivalent should look very similar. So now $strPath is "cache/I-like-balloons.html". Now we gotta figure out if this file exists in the cache directory and if it's young or not. Note that you should definitely use a cache subfolder, otherwise there will be read/write/delete access on your root. And that's bad.
$expireTime = 21600; // 21600 seconds = 6 hours
if(file_exists($strPath)) {
// if so, check to see if it's expired
if(time() - filemtime($strPath) <= $expireTime) {
// not expired, let's go to the content
require($strPath);
exit();
}
}
filemtime() just returns the UNIX timestamp (seconds since epoch) of when this file was last modified. filectime() didn't quite work right with IIS for some reason, but in this case, they will be the same 99% of the time. If the file exists, and it's young, then I simply require() it. This will display the static page, and it will do so without changing the user's address bar to "http://www.mgroves.com/cache/I-like-balloons.html", much like server.transfer in ASP. I think that the FileSystemObject should be able to handle the task in ASP, and ASP.NET has similar functionality.
So that will take care of the simplest, most common case. 8 lines or so of PHP and one page of static content. That sure beats multiple PHP scripts (header/footer/content) and multiple MySQL calls. But alas, that isn't the hard part. Now we have to worry about the requested page not being in the cache, not being young, or not even being a valid page in the first place.
To do this, I finally have to make a MySQL call.
// check to see if the title of this page is anywhere
$sqlquery = mysql_query("SELECT bID, theTitle, some_content, friendly_name
FROM myblogs
WHERE friendly_name = " . PrepSQL($strFriendlyName) . "
LIMIT 1");
if(!($rs = mysql_fetch_assoc($sqlquery))) {
// show the real 404
header("location: 404.php");
exit();
} else {
$varBlogID = $rs['bID'];
$varTitle = $rs['theTitle'];
$varContent = $rs['some_content'];
$varFriendlyName = $rs['friendly_name'];
}
All this is doing is grabbing the title, content, etc from the table using the $strFriendlyName we had earlier. Each of my blogs has a normal text title and a hyphenated version of the title called "friendly_name". [Aside: sometimes you'll have some funky characters in your blog title that aren't valid in filenames, so be careful and maybe put some error checking in here just in case.] I also use "LIMIT 1" on this query because I read someplace that if you are only expecting one row, you can improve performance by only asking for one row. You can do "SELECT TOP 1" if you are using MS-SQL.
If no rows are found, then we can do a regular redirect to the actual 404 page. If it does find a row, I just put the data into variables and move on. Delete the old cache file, if it exists:
// now we must delete the expired cache file, (if it exists)
if(file_exists($strPath)) {
unlink($strPath);
}
Now I have to write a new static page. I could just do an fopen and write everything I want in the static file line by line in PHP, but I still wanted it to be fairly modular and use a header.php, footer,php, etc. So I stumbled upon these ob_ (output buffer) functions in PHP that turned out to be quite handy! If you turn on output buffering, anything you print/echo will go to a buffer instead. You can then read that whole buffer into a variable and then write it to a file. This makes it much cleaner to write and much easier to maintain the header, footer, etc. seperately. I'm not sure if there is an easy way to do this same sort of thing in ASP/ASP.NET, but you could certainly take the line-by-line approach instead.
ob_start();
require("include/header.php"); // include the header graphics and layout and what not
// write this blog's content
echo($varTitle);
echo($varContent);
require("include/comments.php"); // comments fieldset, etc
require("include/footer.php"); // footer graphics, layout, etc
$strWholePage = ob_get_contents(); // copy the output buffer into a variable
ob_end_clean(); // empty the output buffer
$fp = fopen($strPath, "w"); // write the entire thing to a file in the cache directory
fwrite($fp, $strWholePage);
fclose($fp);
// now send the user to the newly created file
// and that's it!
require($strPath);
exit();
That is all there is to it. You now have friendly URLs and cached content. I repeated this process for the main page (index.php), the RSS feed (mdg_rss.php), and a page to handle legacy URLs (blog_archive.php). For each visitor to a single blog, the most MySQL calls they will usually set off is 0. Most visitors will only set off ~8 lines of PHP script. If there is a better way to reduce server load programmatically (i.e. not by twiddling server settings), then I'll eat my hat.
Some additional notes:
I haven't quite figured out how to handle comments yet. The easiest way, of course, is for them to 'submit' to an additional processing page which then destroys the cached page the comments would be displayed on. The only snag here is validating the comments form. I would very much prefer that any error messages be displayed on the same page. We'll see how that goes.
You will probably need to consider caching when working with your back-end tools as well. For instance, when you add a new blog, you should probably destroy and/or recreate the cached version of the front page. If you make a change to the header/footer, you might as well go ahead and delete the entire contents of the cache directory. It might also be a good idea to clear out really old pages in the cache directory as well if hard drive space is a concern.
While I may bill this as a "Digg-proof" solution, I really hesitate to say so, because if there's anything that shouldn't be underestimated, it's the creativity and focused firepower that a horde of Digg users is capable of. There are very likely better ways to do this that revolve around server settings and I'm sure someone will say "Don't use IIS, you M$-loving fanboy!" or some such nonsense, but I think the solution I've outlined here is a very serviceable, programming based solution that does the job just fine.

1
bryan of http://www.staga.net
What did you use to provide IIS with the capability to use friendly URLs without extensions? I am very interested in this type of functionality for IIS.
Posted at October 8, 2006 on 5:16pm