Solving the WordPress Traffic Overload Problem

This article will take approx 2 minutes to read.

Anyone who’s been around WordPress for a “Digg effect” or other massive influx of traffic knows that it can be a real problem. From a technical standpoint, the problem is that PHP is entirely loaded into memory for every pageload. That includes the 99% of PHP that is not being used to actually render the page.

On low traffic sites, this problem is not necessarily noticed. It doesn’t have a huge impact. However when there are hundreds of requests hitting a server in a single second, that kind of overhead builds up very fast.

There are solutions to this sort of thing and depending on what the scale of the environment is, some might be more excessive than necessary. The WP Super Cache plugin is a quick solution that will cause pages loaded on WordPress to be cached meaning that if subsequent page loads can pul the HTML from the cache without having to load the overhead of PHP as well, everyone wins. On the more extreme end, server configurations can be made to send requests for different types of content (for instance, images) to specialized servers optimized for that content type.

Very geeky stuff. It’s important to note that WordPress gets a black eye all the time for it’s caching mechanisms and ability to handle the load of a “Digg effect”, etc. In fact, Instapundit Glenn Reynolds is the latest to take a stinging swipe at WordPress and trust me when I say, we heard it loud and clear.

At b5media (where I’ll be leaving as the Director of Technology soon), we’ve had to deal with this as well and have managed to develop really sound solutions to some of these problems. However, for WordPress as a whole, it is a well recognized problem that not everyone can solve by following in our footsteps (or WordPress.com footsteps).

We’re going to do what we can to help solve this problem once and for all as two of our developers, Mark Jaquith and Brian Layman will be mentoring a Google Summer of Code intern to develop a robust caching engine for WordPress. We hope that this exercise will result in a more reliable (and sane!) caching mechanism.

Integrated Caching Solutions will improve WordPress’s speed and reliablity out of the box and allow people to “Digg Proof” their sites without the struggle of installing plugins on a site that is virtually unreachable. (Source: WordPress Google Summer of Code 2008)

Glenn, I hope that the work that Mark, Brian and our intern will be doing will improve the WordPress problem. In the meantime, let me know if I can help you with anything (though I believe you are using Movable Type). It is a known issue and it’s one that needs to be solved and hopefully some steps can be made toward that this summer.

Comments

  1. says

    From a technical standpoint, the problem is that PHP is entirely loaded into memory for every pageload. That includes the 99% of PHP that is not being used to actually render the page.

    I don’t think PHP itself is the problem. The most benefit when dealing with high-load situations comes from not having to do a dynamic WordPress load — even if that caching solution requires PHP. The WP-Super-Cache solution is nice, and it certainly gives you some incremental benefit above a PHP-powerered output cache, but it requires

    1
    mod_rewrite

    , and thus excludes a certain segment of WordPress users. Out solution is going to have to be much more generally usable.

  2. says

    PHP is only really a problem when there’s a very very high load. A PHP cached page will eventually cause the load to rise faster than a static html file served with a mod_rewrite rule.
    Of greater concern is the load generated by queries on the db. A rampaging scraper bot hitting 10 uncached archive pages simultaneously can bring my reasonably well equipped VPS to it’s knees. When those pages are cached as html files the server hardly notices of course!

  3. Jacob Santos says

    Indeed Mark, it is a problem if you are using CGI environment where PHP has to startup and shutdown on each request, but on server environments which only run the PHP startup and shutdown once, have very little overhead with processing scripts. The overhead is pulling in the PHP source files, which is generally small, unless your server configurations are slow, but with Opcaching it pretty much solves that problem as well.

    So generally, functions are compiled, which is a fairly quick process for all that PHP has to do. If you think about it, PHP is really fast in compiling large PHP based web applications. So for the functions that aren’t used, you still don’t have that much overhead. You might save a millisecond or two, but that is only a millisecond or two, which is less than a thousandth percent of the total time it takes.

    The overhead comes when you have to enter the function stack and that is interesting with its overhead. During the performance tests for the plugin API fix, I encountered that the Plugin API is faster if it had code duplicated in multiple files instead of placing it in one function and used in each of the functions. You see it because the Plugin API is used very often, so not having the code separate would save time, but would require updating more than one area and from past commits and patches, it is known that not every duplicated area is fixed. So if you have three places and only fix one or two, then you still have one area that needs updating. Anyway, that is a tangent, so I’ll get back on topic.

    Basically, the PHP Engine is not the problem, unless you really, really have a lot of visitors, like Yahoo, where every millisecond counts. Then you’re better off writing an extension for the slow parts.

  4. says

    From a hosting standpoint, the issue is available apache connections, or available sql connections. There are only so many that a server is configured to answer at a given time. And the configuration depends on the power of the server, ie. a dual xeon vs. a single P4, amount of RAM, etc. etc. We’ve had servers absolutely LUNCHED by sites running Movable Type when a spammer runs a script that hammers at the comment or trackback scripts. Similarly we’ve had servers lunched by sites running WordPress that use plugins that make constant queries against the db.

    There is no magic bullet for an Instalanche, or a Slashdotting. When apache/sql are overwhelmed, they’ll restart, which will result in 404s or db connection errors during the restart process, yes. Failover setups are *expensive,* and reserved for those with excessively deep pockets, ala cnn.com, etc. The rest of us have to make do.

  5. says

    I don’t use WordPress but I suspect that MySQL goes into a trance with large numbers of hits long before Apache. Caching will certainly help that.

    I’m surprised that an Instalanche or a Slashdotting raises the numbers of hits above the background of spammers. I don’t get instalanched often, but the last time I got one, it was good for about 8K pageviews, which was only <10% of the total hits on the server for the same period.

    By contrast, a recent spam storm on the trackback link, DID bring the same server to its knees. I implemented some countermeasures and things seem to be OK for now (fingers crossed).

  6. says

    On the issue of integrating a solution into the core of WordPress, I think its pretty interesting that this measure is being taken. The vast majority of blogs would never need such protection, and the ones that do have probably already prepared for the increase in traffic. Is it worth the development time? We’ll see.

    On the plus side, having an integrated system as opposed to a plugin can only improve the situation, I’m sure, but is wp super cache not good enough?