O'Reilly logo

High Performance Drupal by Nathaniel Catchpole, Narayan Newton, Jeff Sheltren

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Drupal Performance Out of the Box

Drupal provides several features and configuration options both in the core install and in contributed modules that can affect a site’s performance and scalability. Making use of these can provide dramatic improvements in site performance compared to Drupal’s default settings. While many of these settings are essential when running a large Drupal website in production, they are not enabled by default on new installs and can easily be forgotten when moving a site from development to production. It’s therefore quite common to see newly launched sites with one or more configuration options disabled, leading to performance and scalability issues that could have been avoided with, in many cases, just a few minutes work.

In addition to modules and configuration options that provide quick wins for improving performance, we’ll also discuss some common pitfalls.

Page Caching

The majority of requests served by a Drupal site will either be requests for full HTML pages served to browsers or read-only requests for content in other formats, such as RSS or JSON-LD. Serving a request from Drupal involves the following:

  • Parsing the request
  • Loading various necessary services and modules
  • Locating the correct route controller and executing it
  • Rendering in the desired format

The single biggest improvement to application performance that can be made is simply to skip as many of these steps as possible via page caching. When a request comes in, the URL itself (and other request context in Drupal 8) is used as a cache identifier. If there’s a cache hit, the output is sent from the cache rather than built from scratch in PHP.

While cached pages are served in a fraction of the time of a “normal” Drupal request, how much benefit a particular site might get from page caching varies greatly based on site usage. Understanding the strengths and limitations of page caching is important when considering more advanced optimization techniques.

When Should You Use Page Caching?

As a general rule of thumb, page caching is effective as long as the time saved by cache hits exceeds the overhead of having page caching enabled for cache misses.

Let’s take an example of a site with a very low cache hit rate—say, a 1:30 hit/miss ratio. Note that all the numbers here are entirely for illustration purposes and don’t necessarily reflect any real websites:

Time to serve a page without caching: 300ms
Overhead of page caching on cache misses: 2ms
Time to serve a page from cache: 5ms

The 30 cache misses add an additional 60 ms across all requests (time spent checking and then writing back to the cache).

However, the single cache hit saves 295 ms compared to building the page from scratch, meaning that there is a net gain of 235 ms across all requests even with such a low hit rate.

The numbers will vary dramatically depending on the site, although 300 ms can be quite conservative to generate a full page on a complex site.

There are various types of sites and traffic patterns that can lower hit rates or make page caching unviable:

Authenticated traffic
Page caching does not work if a visitor has an authenticated PHP session. By default, Drupal customizes pages for authenticated users, for example, displaying their username or administrative links based on their roles. A site that has 100% authenticated traffic—for example, a private intranet or ticket tracker—will not get any benefit from full page caching.
Breadth of content
If a site has a large number of articles or similar content and regularly gets traffic to this content via search engine referrals, external links, crawlers, etc., page caching can be of limited value. To show this contrast, consider that one page visited 1,000 times within the length of the cache TTL will give 999 cache hits, whereas 1,000 pages visited once each during the same period will give 0 cache hits. Many sites will have traffic patterns that encompass both of these extremes. Due to the relatively low cost of writing a page to cache versus building it each time, it’s usually worth enabling page caching.
Frequent updates
By default the page cache is invalidated every time content is posted, deleted, or updated on the site. This means you can enable page caching without being concerned that site visitors will see out-of-date content. However, it also means that a site that is updated every minute will invalidate the entire page cache every minute, vastly reducing the chance of a cache hit. On the other hand, if you have infrequently posted content, flurries of activity with long pauses in between, or updates at particular times of the day, page caching will be effective for the bulk of the time. This situation may be improved for both cases in Drupal 8, which has introduced cache tags for smarter cache invalidation. Cache tags allow cache entries to be associated with the specific content entities that are rendered so that they can be invalidated when those entities are updated or deleted; however, at the time of writing, this has not been integrated with the page cache.
PHP sessions for anonymous users
The page cache is bypassed for any anonymous users with a PHP session. Since Drupal 7, PHP sessions are initialized on demand when something is written to $_SESSION, so whether a user has a session depends on enabled code and user activity. Actions such as adding an item to a shopping cart often trigger a PHP session, and this is something to be generally aware of when writing code for custom or contributed modules.
Customized content based on request parameters
Some sites customize the user experience for anonymous users at the same path. This may involve using browser settings for preferred language to determine which translation of a text to show, showing region-specific content based on IP address, changing rendered output based on a cookie, switching to a mobile-specific theme based on user agent, or showing content in different formats based on Accept headers. Since the path is used as the cache key, Drupal is only able to cache and serve one copy of the content, meaning that users see incorrect content when such a feature coexists with core page caching. Drupal 8 natively handles Content-Type Accept headers as part of the page cache key, so that different versions of a page will be saved for different content types, but it does not handle the other cases yet.

Internal Page Caching

Drupal core provides its own internal page cache. The configuration option is accessed via admin/config/development/performance and allows the full rendered HTML output to be stored using Drupal’s own cache API. When the option is enabled, Drupal loads and executes the minimum possible PHP code to check the cache item and serve the page request. This can require as little as one database lookup, meaning pages can be served from PHP in a matter of a few milliseconds.

The configuration settings $conf['page_cache_invoke_hooks'] = TRUE and $conf['page _cache_without_database'] = TRUE allow Drupal 7 to skip even more of its usual bootstrap when serving cached pages, so that pages may be served without any database or cache lookups except for the page cache item itself.

This can make the difference between a site being able to serve tens of requests per second or hundreds, including in shared hosting environments.

Drupal also provides an option to compress cached pages. This makes use of gzip compression when the client supports it, which can dramatically reduce the payload of HTML sent to the browser. If you have control over your server configuration, however, you may want to enable compression within your web server or reverse proxy instead of from within Drupal. Drupal’s own page compression only works for pages served from the internal page cache, whereas mod_deflate and similar work for all requests to the site, whether cached or not.

Reverse Proxy Caching

The “Expiration of cached pages” option is located at admin/config/development/performance. Setting this option affects the max_age value of the Cache-Control header sent by Drupal, which allows reverse proxies to cache pages. The most common reverse proxy used for Drupal sites is Varnish, so we’ll use that as the example here; however, many of these assumptions also apply to other caching options such as serving pages via a CDN, or Nginx proxy caching.

Using a reverse proxy such as Varnish to serve cached pages has advantages over the internal page cache, since Varnish is able to serve the entire page request without having to call back to Apache and PHP. This significantly reduces server load by completely avoiding the web server, PHP, and the database. Note that Varnish is not typically available in a shared hosting environment and may not be an option for everyone, although many Drupal-specific hosting providers do offer it.

When serving cached pages, there is one limitation that Varnish has compared to Drupal’s internal page cache: Drupal, by default, can’t expire pages from Varnish when content is updated.

There are two options for handling this:

  1. Set up Drupal to purge Varnish entries via the command interface or a PURGE HTTP request based on updates to the site. This requires a custom Varnish configuration, so it may not be available to all site owners. Assuming you have this option, though, contributed projects such as the Varnish HTTP Accelerator Integration module or the Purge module make it easy to set up your Drupal site to purge items in Varnish, and more granular purging can be enabled via projects such as the Expire or Cache Actions or CacheTags modules.
  2. Set the max_age to a low value, such as five minutes, while keeping the internal page cache enabled. This keeps pages fresh in Varnish at the cost of a lower cache hit rate, while ensuring that Drupal only builds a full page from scratch when necessary. However, it requires some additional storage since pages are cached in two locations.

CSS and JavaScript Aggregation

Frontend performance best practices recommend combining page resources into as few requests as possible, and Drupal core provides an option to do exactly this out of the box. CSS and JavaScript may be added to pages by Drupal core; any enabled core, contrib, and custom modules; and themes. By default, each file is added to the page individually in the HTML markup, meaning potentially dozens of HTTP requests on each page as each file is requested individually by the browser. Aggregation in Drupal has particular challenges that make it more complex to get this right than it might be for a custom web application. The assets added to the page depend on:

  • Which modules are enabled
  • Whether the enabled modules define global assets to be added to every page and/or conditional assets added only on certain types of request
  • Which theme is active for the request, and whether that theme defines global or conditional assets

Therefore, when assets are added to the page, they’re added with particular metadata, and with information about whether they’re part of the base application, from a module, or from the theme. The aggregation logic in Drupal 7 breaks these into the following groups:

  • Assets from System module added on every page
  • Assets from System module added conditionally
  • Module assets added on every page
  • Module assets added conditionally
  • Theme assets added on every page
  • Theme assets added conditionally

In Drupal 8 these are being consolidated into two groups, a change that may be backported to Drupal 7:

  • Assets added on every page
  • Assets added conditionally

Files will not be aggregated if they define custom attributes or a specific media type.

Separating files that are added to every page from those added conditionally reduces the potential that users will download multiple large aggregates containing lots of duplicate assets as they browse around different pages of the site. This was the case with Drupal 6’s aggregation strategy, which relied on a single aggregate per page.

Two other behaviors are enabled when CSS and JavaScript aggregation are switched on. First, Drupal will write gzipped versions of each file and try to serve them to clients that accept gzipped content via default .htaccess rules. You may want to consider disabling this behavior in .htaccess if already using mod_gzip/mod_deflate or equivalent.

Additionally, CSS files are stripped of whitespace and comments. No preprocessing is done for JavaScript files, but several core JavaScript files are already minified, and the Speedy module helps by replacing those that aren’t with minified versions.

Logging

Also provided by core but requiring a certain level of control over your hosting environment is the syslog module. Drupal enables the database logging (dblog) module by default, which directs all watchdog() calls to the database. Modules that log verbosely or that generate PHP notices and warnings can cause a large number of database writes. Verbose logging and PHP errors should be fixed at source, by auditing the logs periodically and fixing custom code or submitting patches to contributed code to avoid the logging or errors. Switching to syslog allows any remaining or unexpected messages to be logged by the operating system rather than the database, which can help to reduce overall load on an overworked database server.

The Cache and Other Swappable Storage

Drupal’s cache API (used for internal page caching, as well as many other things needed during the course of a request) uses the database storage implementation by default. As with logging, simply setting up the cache to write to somewhere else will take some of the load off the database server. Additionally, some cache backends have further benefits over database caching, such as improved performance or the ability to scale horizontally. Less frequently accessed but equally swappable are the queue and lock storage backends.

Core doesn’t provide a useful alternative storage implementation (except for a null implementation useful for development, or if you believe the YouTube video “MongoDB Is Web Scale”), but contributed projects are available providing support for Memcache, Redis, MongoDB, APC, and Files.

Cron

Drupal core and many contributed modules rely on hook_cron() for tasks such as indexing or garbage collection. Up until Drupal 7, site administrators were required either to set up a cron job to execute hook_cron() on their servers or to install the Poormanscron module, which triggers the cron job automatically via PHP upon the first request after a certain time limit. If neither of these was set up, garbage collection didn’t run, which could lead to watchdog and cache tables growing indefinitely as expired items were never cleared up.

From Drupal 7, the functionality of the Poormanscron module was moved into core and is enabled by default. Drupal will execute these periodic cron jobs inline during a page request every three hours, meaning the user that triggered the cron run may have page serving delayed by seconds or minutes while the various jobs finish.

To avoid both of these scenarios, ensure that Drupal cron is configured to run frequently. This can be done using a cron job or a more advanced job scheduler, such as Jenkins. Cron also has high resource/memory requirements, so it should be run via drush to avoid taking up a web server process and artificially inflating PHP memory limit requirements with mod_php.

Views

Views (both the Drupal 8.x core version and the Drupal 7 contributed module) ships with a built-in time-based caching system, while additional modules can also provide alternative caching implementations.

Caching settings are located under the advanced section in the Views UI. After enabling caching, there are two settings available:

Query results
This caches only the results of the main listing query configured in the View, using the query itself as the cache key. Views allows very complex queries to be created, and caching the results is the quickest way to reduce the performance impact of the queries on a site.
Rendered output
This caches the rendering of the items in the View, once the results have been retrieved. This can also be expensive—it may involve loading entities, running additional queries for field values, as well as invoking the theme system. Since the cache is time-based, an entity update such as changing a node title won’t be reflected in the cache until the items have expired.

Where possible, both of these should be set to the maximum possible time. If you’re concerned about cache coherency, setting a longer value for query caching and a shorter value for rendered output is a good compromise.

Configuring caching for Views is often forgotten in the process of site building, and this is one of the first simple changes to look at making (after page caching) when a site runs into performance issues.

Note

If you want to ensure that you never forget to enable caching for a View, consider installing the Views Cache Bully module, which enforces time-based caching on any View where it’s not configured.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required