Advanced Caching with Nginx and memcached

Ilya Grigorik (see his bio in Contributors)

Problem

I’ve built a popular destination (or an API server), and now I need to handle several thousand requests a second, but I don’t have the time to rearchitect my code, or worse, rewrite in a faster language.

Solution

Memcached, the darling of every web developer, is capable of turning almost any application into a speed demon. No matter which language you’re working in, your application server is usually the slowest part of the chain: no application server is faster than any web server, even if yours is written in C.

Nginx, a very popular HTTP and reverse-proxy server, by default comes prepackaged with a memcached module, which allows us to bypass the application server and serve cached responses from memcached directly. With minimal code changes, we’ve implemented this technique for AideRSS API servers, and immediately saw our request throughput improve by 400%—from 800 req/s to 3,700 req/s!

Discussion

Most popular open source web servers can be configured to serve cached data quickly and directly from one or more memcached server instances, rather than from your filesystem or an application server. Apache (see http://code.google.com/p/modmemcachecache/) and Lighttpd (http://trac.lighttpd.net/trac/wiki/Docs) require additional modules to enable this functionality, whereas Nginx (http://wiki.codemongers.com/Main) comes with native support and offers the most flexible implementation. A relative newcomer to the field, it is quickly gaining in popularity, and is currently the fourth most popular web server (http://survey.netcraft.com/Reports/200806/).

To get started, download the latest copy of the Nginx code base, and run configure, install—this takes less than a minute and has no additional dependencies. Also, make sure to browse the wiki and look at sample configuration files. If you’re coming from Apache or Lighttpd, you’ll be pleased to see that the configuration syntax is virtually identical.

Nginx comes with a built-in memcached module, which allows it to query the cache directly prior to forwarding the request to the application server. If the cache does not contain the item we are looking for, the memcached module will raise a 404 Not Found error, which we catch and redirect for processing on our application server:

upstream appserver  { server 127.0.0.1:9010; }
server {
    location / {
        set  $memcached_key    $uri;
        memcached_pass    127.0.0.1:11211;
        error_page        404 = @dynamic_request;
    }
    location = @dynamic_request  {
        proxy_pass  appserver;
    }
}

You can set the key with which Nginx will query memcached via the memcached_key variable directly in your configuration file. Any Nginx variable can be used to create the key: uri, args, http_user_agent, etc.

More complex keys can also be created with the help of the Perl module, which allows you to execute Perl directly within Nginx. To enable this module, specify –with-http_perl_module when running configure. Once installed, we can execute arbitrary code on the incoming request. For example, you can create an MD5 hash of the request URI, and set it as your memcached key:

perl_set $md5_uri '
   sub {
       use Digest::MD5 qw(md5 md5_hex md5_base64);
       my $r = shift;
       return md5_hex($r->uri);
   }
 ';

server {
    location / {
        set  $memcached_key    $md5_uri;
        ...
    }
}

If the cached item is not found in memcached, the request is passed on to your application server, which in turn should construct a response and send it to memcached so that Nginx can serve future requests directly.

It is a good practice to set a Time to Live (TTL) on the memcached record to avoid additional cache invalidations. Once the TTL timestamp expires, memcached will automatically return a 404 error, and the application server can repeat the pattern.

Get Facebook Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.