O'Reilly logo

High Performance Drupal by Nathaniel Catchpole, Narayan Newton, Jeff Sheltren

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Verifying Changes

When optimizing a site that’s under development or in production, the most important step before any other is to gather data on what the performance and scalability bottlenecks are. Without data, a lot of effort can be expended on optimizing code that’s actually performing OK, and conversely, poorly performing code might be overlooked. Without being able to identify and quantify specific performance issues, it’s not possible to verify that improvements have been made. A site might seem “faster” after a change was rolled out to production, but you’ll never know whether this was because of the change itself, or perhaps because crawler traffic or even a distributed denial-of-service (DDoS) attack subsided at the same time.

Analyzing Frontend Performance

Frontend performance monitoring falls into roughly four categories:

  • Getting a summary of a page’s performance based on rules for high-performance websites
  • Waterfall and timeline charts for individual pages
  • Real user monitoring
  • Profiling of CSS and JavaScript

YSlow and Google PageSpeed

Often the first step toward analyzing a site’s frontend performance is looking at the summary report produced by a tool such as YSlow and PageSpeed. Given a URL, or via a browser plug-in, they’ll analyze a request to an individual web page and give you scores for specific areas such as number of HTTP requests, cacheable headers, etc. Drilling down on these items will show specific recommendations. Both of these tools are very easy to use and have great documentation, so doing a walkthrough here would be redundant. However, the reports they generate are only useful if you have an understanding of the underlying principles of frontend performance and the effort it’s likely to take to fix the issues that are identified. Chapter 2 goes into detail about Drupal-specific advice on how to deal with many of these issues, based loosely on the categories that are highlighted by these reports.

While generating a summary report from YSlow or PageSpeed is a very quick task, this is commonly treated as a manual step to be taken by an individual developer and so can often be overlooked during the development process, or done much less frequently than changes are made that impact the frontend performance of the site. However good your intentions may be, automating frontend performance testing as part of your continuous integration process helps to ensure that your site is regularly put through its paces and helps you discover changes that may negatively impact performance before they’re deployed.

Note

PhantomJS is a headless WebKit implementation including a JavaScript API for which there is a YSlow plug-in. This allows you to trigger a YSlow report from the command line and write the output in a structured format such as JUnit.

Once you have a summary of a page’s performance, it’s often necessary to drill down to individual requests to see what’s going on.

Waterfall Charts

Waterfall charts show the sequential process of loading a web page. Primarily this shows page resources as they’re requested and downloaded by the browser, as well as timings for “start render” and “document complete” events, represented by the green and blue vertical bars on the chart. See Figure 6-1 for an example waterfall chart, created for a load of http://drupal.org.

Each horizontal bar on the waterfall chart represents a separate request. Each request is further broken down into the following components:

DNS lookup
The address lookup for each domain that resources are served from. This occurs once per domain/subdomain rather than once per resource on the page, and may be skipped if the browser implements DNS caching and has a warm cache for the domains on the page.
Waterfall chart for the Drupal home page

Figure 6-1. Waterfall chart for the Drupal home page

Initial connection
The upstream connection to request the resource from the server. This is constrained by upstream bandwidth, so tends to be worst for mobile connections. It’s also affected by the number/size of cookies that the browser needs to send with each request—these are sent upstream for every request the browser makes that it has cookies set for.
SSL negotiation
The process of checking SSL certificates (only for SSL requests, of course).
Time to first byte
The time until the browser downloads the very first part of the resource. This depends on the speed of response of the server (is it under load, and was the file served by Varnish from memory or by Apache from disk, or even generated by PHP dynamically?).
Content download
This depends mainly on file size—whether the file is optimized to the correct size, gzipped, etc.

Drupal.org is quite well optimized at 23 requests on the front page, but looking closely at the waterfall chart, it’s still possible to see room for improvement. For example, btn-search.png is a single-use image for the search button, which could be consolidated into one of the existing sprites since the same styling is used for other buttons on the page. Any request for an image that’s not associated with content should be treated with suspicion! Even when images are consolidated into sprites, consider whether it might be possible to replace them with typographic or CSS features instead.

Note

Follow along with the issue we filed for this while writing this chapter.

While webpagetest was used for generating this example, you’ll normally get richer information from local browsers using tools such as Firebug for Firefox, Chrome Developer Tools or Safari Developer Tools. These provide the ability to view all the HTTP headers for a request, as well as a timeline of the browser rendering layer itself, which can help to identify issues such as how many reflows and repaints happen on the page (i.e., due to images included in the page without dimensions).

Real User Monitoring

When testing frontend performance, a key point to bear in mind is that as a developer (or Jenkins job) running synthetic tests on a site that you’re working on, you represent absolute best-case conditions for frontend performance in nearly all cases. Developers are most likely to be:

  • On fast Internet connections
  • Running powerful hardware
  • Using recent operating system and browser versions
  • From a location geographically close to the origin server

Whereas visitors to a site could be from multiple continents, running old hardware with out-of-date browser versions over public WiFi or 3G networks.

While an improvement made for the best-case scenario is usually an improvement for the worst-case scenario, too, particularly in the case of network latency and browser rendering speeds, differences in visitors’ connections and hardware can mean extreme variation in results for the same page. A visitor with a slow 3G connection is likely to spend a much greater proportion of the page load time making upstream requests to the server (so watch those cookies!). A visitor with an older smartphone is much more likely to experience sluggish rendering performance due to poorly optimized markup, CSS, and JavaScript, because that device will have significantly less CPU available than the average laptop.

In response to this, there is increased reliance on real user monitoring (RUM) to collect performance metrics from actual visits to sites in addition to “clean” tests.

RUM tends to be dominated by commercial Software as a Service (SaaS) solutions, since it requires capturing, storing, and presenting large volumes of information that many organizations do not have the ability to maintain in-house. However, there are open source alternatives, such as Jiffy (open sourced by Netflix) and Steve Souders’ Web Episodes, and free commercial alternatives such as the page timing feature of Google Analytics. There’s also a Drupal contributed module for the navigation timing API that can be used to directly collect information from requests to the site within Drupal, although this is not suitable for production sites.

Regardless of the mechanism used, the end result is collecting real requests to the site by visitors, including timings such as time to first byte and page complete, via a snippet of JavaScript included in the web page. By recording this data for all requests made to a site, it’s possible to generate reports that show overall trends for end user performance on the site—which pages are slow, whether particular browsers have a particularly slower experience than others, which countries are getting the best and worst response times, etc.

Depending on the tool used, it may be possible to collect diagnostic information as well. However, more often this information can be used to feed back into synthetic tests to make them better reflect reality, or to target them on specific areas of the site that need improvement or that experience the most traffic.

Analyzing Application Performance

Application performance is often the black sheep of site optimization, with efforts focused primarily on frontend performance and infrastructure. In our case, application performance means… Drupal! More specifically, we want to look at the PHP code and database queries that Drupal needs to run to serve a request. Application performance is not the same as the performance of your site overall. If 90% of requests are served from Varnish, then Drupal is only involved in the other 10%—but these 10% will be the slowest, and they may come all at once if the Varnish cache is cleared.

Measuring and understanding application performance is not at all the same thing as load testing. While load testing frameworks often include performance measurements, this is measuring the end-to-end performance, including infrastructure and network layers. Load tests are likely to only find the most glaring application performance issues, since there can be a high variation in response times between tests. See Chapter 20 for more discussion on how to run load tests and what to look for.

By contrast, understanding application performance means examining how the code that runs your application performs, more or less in isolation from external factors. This is done by profiling the code.

The Devel Module

The Devel module provides a number of debugging tools for developers. Amongst these are tools for monitoring page timing, memory consumption, and the query log, and for XHProf profiling. Most of the features discussed here apply across the Drupal 6, 7, and 8 versions of the module.

Page timing

Devel’s own page timing uses microtime to measure the time from drupal_bootstrap_configuration() to PHP shutdown and requires no PHP extensions to be enabled: just check the box on admin/config/development/devel. Microtime across a full page request can be extremely variable, so to ensure a like for like comparison, refresh the pages several times to ensure all application caches (as well as the opcode cache, MySQL query cache, etc.) have been primed, or expect very different results each time. There is one exception to this: when timing cache misses, you should ensure that all caches really are empty. The page timer can be used for a quick sanity check as to whether a page is taking dozens, hundreds, or thousands of milliseconds to load and is useful in situations where there is no opportunity to install lower-level tools, but usually it’s better to go straight to one of the other techniques for measuring page execution time discussed in the following sections.

Memory usage

Memory usage in the Devel module is measured using memory_get_usage() and memory_get_peak_usage() during shutdown. As with page timings, no PHP extension is required, and the option can be enabled at admin/config/development/devel. Since this information is taken from PHP itself, it can be relied upon more than the page timings, but there are still some caveats. If you’re trying to find out how much memory a page is going to use, ensure that other debugging tools like Xdebug, XHProf, and the query log are disabled, since these will increase memory usage. Also ensure that any opcode caches are enabled (unless specifically testing memory usage without an opcode cache), since these can impact memory usage by several megabytes. Finally, bear in mind that Drupal’s memory usage varies widely depending on whether caches are primed or not. Often for memory usage, the worst case should be measured as opposed to the best case—an Apache process will always have a memory footprint of the worst PHP page it served from memory, not the best.

Query log

Devel’s query log measures database queries from devel_boot() until PHP shutdown. Since it uses the APIs within Drupal’s own database layer, it’s able to present the exact queries that were triggered on the page in question. This makes it a very useful alternative to mytop.

Note

Drupal may execute some database queries before hook_boot() runs, depending on whether the database cache backend is used and which Drupal version is being tested.

Like the options for measuring page timing and memory usage, the query log option can be enabled at admin/config/development/devel. There are two options that allow customization of the query log display:

Sort Query Log
The query log is sorted “by source” by default, showing the queries in the order they were executed. Changing this to “by duration” shows the queries in the order of execution time, slowest first. This is useful to quickly identify slow queries on the page, but bear in mind that query timings may change on each request due to the influence of the MySQL query cache, server load, or just margin of error, and this can make it difficult to identify the same query across different page requests since it may change position each time.
Slow Query Highlighting
This defaults to 5 ms, and any query taking longer will be highlighted in red in the query log. Five milliseconds is not particularly slow, but most queries executed by Drupal should be well indexed, may be in the query cache, and are likely to have small result sets, so they should execute in well under 1 ms in most cases. Therefore even leaving this option at the default setting should not result in any highlighted queries on a stock install.

While not specific to the query log, there’s one other Devel setting that affects the query log and other performance measurements taken by Devel: “Display redirection page.” Drupal form submissions make a $_POST request to the same page as the form. Once the form has been processed, the page gets redirected. Since the query log only collects queries during the request, the redirect prevents the log from ever being rendered. “Display redirection page” provides an intermediary page that allows the output to be printed, along with a link to the page that was going to be redirected to. This is an invaluable option when trying to examine queries executed after submitted content or searches, which would otherwise be invisible except at the MySQL level.

Apart from these settings, note that the query log will only be displayed to users with the “Access developer information” permission. On a development site it should be fine to enable this for anonymous or authenticated users. The Devel module should never be enabled on a production site.

Once the option is enabled and permission granted, visiting any page will result in the queries being shown at the bottom. Figure 6-2 shows an example from the front page of a stock Drupal 7 install with no caches primed.

Devel query log with an empty cache

Figure 6-2. Devel query log with an empty cache

Figure 6-3 shows the same page load, now with a warm cache.

Devel query log with a warm cache

Figure 6-3. Devel query log with a warm cache

At the top of the first query log is the summary:

Executed 91 queries in 29.05 ms. Queries exceeding 5 ms are highlighted.

And the second query log has the summary:

Executed 33 queries in 10.27ms. Queries exceeding 5 ms are highlighted.

While brief, this summary provides two fundamental pieces of information that indicate what problems might be present on the page:

Total time
This is the combined time taken for all queries on the page. In this example, the time is approximately 10 ms with a warm cache, which is likely to be a relatively small percentage of the overall time taken for the page request. An unindexed query with filesorts and table scans on a large data set could take hundreds of milliseconds or several seconds, so a high number here should be an immediate red flag.
Number of queries
Depending on the contributed and custom modules installed on a site, and the complexity of the page being rendered, the number of database queries executed on a Drupal page can vary from under 10 to several hundred or even thousands. A slow query taking seconds is more likely to be making a website hard to scale, but having to make hundreds of round trips to the database is likely to be a major contributor to slow page-building performance and will increase the overall load on the database server.

By dividing the number of queries by the total time spent executing queries on these pages, we can see that each query took approximately 0.3 ms on average. This kind of performance is expected with a stock core install and a very small data set.

After the summary, each query is listed in a table with the following columns:

ms
The number of milliseconds taken to execute the query.
#
The number of times the query was executed during the request. Note that from Drupal 7, the Devel module ignores placeholder values when counting queries, so two queries with different placeholder values are treated as the same query. Therefore, this may not show actual duplicates, but it will indicate when lots of queries are running that are very similar.
where
The caller that executed the query. This links to api.drupal.org by default but can be set up to link to a custom API site if desired.
ops and query
Added in Drupal 7, the ops column provides three ways to view the query itself, shown in the neighboring (query) column. The default is P, meaning PHP Data Object (PDO) placeholders (for example, :cids) are displayed:
SELECT cid, data, created, expire, serialized FROM cache WHERE cid IN (:cids_0)

Note

A query run via db_query() will have named placeholders as written in the code calling db_query(). When queries are run via db_select(), Drupal’s query builder generates the placeholders dynamically, so they end up being called db_condition_placeholder.

Selecting the A link
Shows the actual query executed, with placeholders replaced. This is ideal for copying and pasting into a MySQL client to check the result set or to try different variations:
SELECT cid, data, created, expire, serialized FROM cache_bootstrap WHERE cid IN
('foo')
Selecting the E link
Shows an EXPLAIN for the query, saving you the step of copying and pasting the query into a MySQL client in order to get this information. EXPLAIN shows the steps that MySQL goes through to solve a given query and can be very useful to track down why a query may suffer from performance issues. Table 6-1 shows Devel’s EXPLAIN output for a query.

Table 6-1. Devel module query EXPLAIN

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

1

SIMPLE

cache_bootstrap

const

PRIMARY

PRIMARY

767

const

1

Target
This is the database target for the query, which could either be the primary database or a separate database being accessed via the database API.

Xdebug

Xdebug is a PHP profiling and debugging extension authored by Derick Rethans. It’s very stable and as a debugger is more or less required for developing in PHP. Xdebug profiling generates cachegrind files, which can then be viewed in an interface such as KCacheGrind, which is available for most operating systems; or webgrind, which is written in PHP and has a web interface, as the name suggests.

Enabling profiling with Xdebug is done via one of two methods. This enables profiling for every request:

xdebug.profiler_enable =1

This allows profiling to be triggered via an XDEBUG_PROFILE cookie or a $_GET or $_POST parameter:

xdebug.profiler_enable_trigger =1

Xdebug will then write cachegrind files to xdebug.profiler_output_dir.

There are two more php.ini settings to be aware of:

xdebug.profiler_output_name
configures the filename to be written to. By default this is the process ID, which means a single Apache process will write to the same filename with every request.
xdebug.profiler_append
configures whether cachegrind files written to the same location will be appended to the existing content or overwrite it.

The combination of these two configuration options is very important to bear in mind the first time you try profiling. Since the default configuration is to write with the process ID and overwrite files each time, visiting a Drupal page that triggers an AJAX request (which several Drupal core modules do) is likely to result in the cachegrind file holding the profile of the AJAX request rather than the initial one you intended to profile. Including %r (random string) as part of xdebug.profile_output_name guarantees a unique file each time, and %R ($_SERVER['request_uri']) means that page requests can easily be associated to the profile output. See this page for more filename options.

Profiling with Xdebug carries a lot of overhead, so it’s important to only leave it enabled when profiling. Also, just having the Xdebug extension enabled (without profiling) has a performance impact, so ensure you disable it entirely before doing other kinds of performance testing.

XHProf

XHProf isa profiler for PHP, developed and open sourced by Facebook. XHProf does not contain a debugger and exists solely as a profiler. As a profiler, XHProf has several advantages over Xdebug and is our preferred profiling tool for Drupal. These include:

  • The ability to collect data on memory usage as well as CPU usage and wall time while profiling. Xdebug has the capability to collect memory usage data via function traces, but this is less convenient since it’s a separate mechanism to collecting cachegrind, and there is not the range of UIs available for examining the output.
  • Less overhead from being enabled. Unlike Xdebug, XHProf does not add significant overhead simply by being enabled, allowing it to be installed on production web servers.
  • Considerably less overhead when profiling. In addition, xhprof_enable() has XHPROF_FLAGS_CPU and XHPROF_FLAGS_MEMORY options that determine whether CPU and memory usage data will be collected, allowing a custom balance between the amount of data collected and performance overhead. While profiling does have a measurable overhead, this is low enough that XHProf can be considered for profiling a sample of requests on production sites if desired.

XHProf is available via PECL, so assuming you have PECL available already, it can be installed with:

$ sudo pecl install xhprof

Or, if you have PECL set to stable releases only:

$ sudo pecl install channel://pecl.php.net/xhprof-0.9.3

Note

Many resources on the Internet recommend compiling XHProf from source because the previous 0.9.2 release would not install from PECL successfully. However, this was fixed with version 0.9.3, released in May 2013, so compiling manually is no longer necessary.

You may also need to enable the xhprof extension yourself in php.ini:

extension=xhprof.so

XHProf ships with a web interface built in PHP. This is found in the xhprof_html folder (usually found in /usr/share or /usr/local/share). To use this, set up a virtual host and /etc/hosts entry so that it’s available at http://xhprof.localhost or similar. There are also third-party user interfaces available, but we’ll focus on the default interface for the purposes of this discussion.

XHProf profiling is enabled via the xhprof_enable() and xhprof_disable() functions. This allows profiling to be conditionally enabled or disabled in runtime code—for example, if you were trying to sample requests in a QA or production environment, profiling could be restricted to a particular Drupal user or path, or to a 1/1,000 sample. This also allows profiling to be restricted to a particular section of code if desired.

Most commonly, however, when profiling a website, you’ll want to profile the entire request each time, in order to put the numbers in context. There are several ways to do this:

  • The Devel module provides an xhprof setting at admin/config/development/devel. Enabling XHProf profiling, adding the path to xhprof_lib, and adding the URL to the web interface is all that’s necessary to begin profiling requests. The Devel module uses hook_boot(), so it will not profile code that runs prior to that. Once profiling is enabled, all pages will have a link to “XHProf Output” at the bottom.
  • The XHProf module is similar to Devel in that it allows XHProf profiling to be enabled via configuration; however, it also provides a Drupal interface for examining the output.
  • auto_prepend_file and auto_append_file allow arbitrary PHP files to be included at the beginning and end of each request via .htaccess when using mod_php. This allows the entire Drupal request to be profiled, since XHProf is enabled before any Drupal code runs.

The last method requires some manual steps compared to use of the Devel and XHProf modules, so we’ll detail those here. Two files are required:

<?php
  include '/path/to/xhprof/xhprof_lib/utils/xhprof_lib.php';
  include '/path/to/xhprof/xhprof_lib/utils/xhprof_runs.php';
  xhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
?>
<?php
  $host = $_SERVER['HTTP_HOST'];
  $xhprof_data = xhprof_disable();
  $xhprof_runs = new XHProfRuns_Default();
  $run_id = $xhprof_runs->save_run($xhprof_data, $host);
  $source = urlencode($host);
  $url = "http://xhprof.localhost/index.php?run=$run_id&source=$source";
  echo '<a href="'. $url .'" target="_blank">Profiler output</a>';
?>

Then, in the .htaccess file at the root of your Drupal install, add these two lines:

php_value auto_prepend_file 'var/www/xhprof/header.php'
php_value auto_append_file '/var/www/xhprof/footer.php'

When you request a page in your Drupal installation, you should now see the text “Profiler output” at the bottom of the page.

Now that XHProf is up and running, you can profile a request, look for issues, then evaluate a way to fix them.

Starting with an 8.x install, with the standard profile, create a single node, then visit node/1 as an anonymous user. Drupal has several caches to build on that page, so ensure you visit it two to three times before looking at the profiler output (unless you’re explicitly looking to see what happens on a cache miss).

When viewing the default XHProf UI, the first thing you’ll see is a summary like the one in Table 6-2.

Table 6-2. XHProf overall summary

Total Incl. Wall Time (microsec):

257,290 microsecs

Total Incl. CPU (microsecs):

233,884 microsecs

Total Incl. MemUse (bytes):

19,927,040 bytes

Total Incl. PeakMemUse (bytes):

20,004,696 bytes

Number of Function Calls:

51,151

Each category is worth looking at:

Total Incl. Wall Time
This is the actual elapsed time (i.e., time on the clock, which might be on the wall).
Total Incl. CPU
This represents time spent by the CPU. Time not spent by the CPU would include network round trips—external requests, database and cache queries, etc.
Total Incl. MemUse
This is the total memory used by the script by the time XHProf profiling ends. It includes the impact of loading files as well as any objects, static caches, etc. created during the request.
Total Incl. PeakMemUse
Roughly equivalent to memory_get_peak_usage(); this is the peak memory allocated at any time during the request.
Number of Function Calls
The total number of function and method calls recorded by XHProf.

Just the summary gives us quite a bit of information to work with. Total wall time is 257 ms, and total CPU time is 233 ms. That shows us that the vast majority of time is spent executing PHP. The remaining time is likely to be spent in the database, in the cache system, or on HTTP requests, since those external calls are not counted as PHP execution. Since this site is using the database cache layer, that only leaves two options.

To verify this, we can jump ahead a bit to check an individual function. Searching the page for “PDO” finds PDOStatement::execute(), the lowest-level function in Drupal 8 that executes database queries. Table 6-3 shows a truncated version of the results displayed on the metrics page.

Table 6-3. XHProf results for the PDOStatement::execute() method

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl.MemUse (bytes)IMemUse%

PDOStatement::execute

119

94.4%

29,269

11.4%

6,400

2.7%

728,144

3.7%

As you can see, 29 ms was spent overall, of which 6 ms was CPU time, leaving approximately 23 ms remaining. This confirms that the 23 ms of “missing” time was actually spent on the round trip to the MySQL server, which XHProf doesn’t count as CPU time.

Note

Note that these numbers are quite optimistic for the database, because the database server is local and is not under load. Longer network round trip times or a server under load could dramatically increase the impact of those database queries, as well as larger table sizes once the site has some content, due to slower queries. Also remember that while it’s easy to add additional web servers, scaling MySQL is considerably more complicated, so the sheer number of queries executed per second could end up being an issue on a high-traffic site.

However, in terms of performance, more than 90% of the time spent serving this request was in PHP, so we need to keep looking to find the main source of slowness.

The default XHProf ordering by inclusive wall time shows which functions take the most time. XHProf and Xdebug both make a distinction between inclusive time (time spent in the function and any functions it calls) and exclusive time (time spent only in the function itself). A function that has a high inclusive wall time may not do anything particularly expensive, but may just call functions that do. However, once you’re familiar with Drupal’s bootstrap and rendering process, it gets easier to spot anomalies.

Let’s look at the top few functions for inclusive wall time for this request. Table 6-4 shows what the table on the metrics page looks like, again truncated to fit the width of the page.

Table 6-4. Functions with the highest inclusive wall time for our request

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl.MemUse (bytes)IMemUse%

main()

1

0.0%

257,290

100.0%

105

0.0%

233,884

100.0%

drupal_handle_request

1

0.0%

257,012

99.9%

67

0.0%

233,605

99.9%

Drupal\Core\DrupalKernel::handle

1

0.0%

219,670

85.4%

17

0.0%

201,148

86.0%

Drupal\Core\HttpKernel::handle

1

0.0%

218,776

85.0%

32

0.0%

200,254

85.6%

Symfony\Component\HttpKernel\HttpKernel::handle

1

0.0%

218,640

85.0%

6

0.0%

200,118

85.6%

Symfony\Component\HttpKernel\HttpKernel::handleRaw

1

0.0%

218,634

85.0%

131

0.1%

200,111

85.6%

Symfony\Component\EventDispatcher\ContainerAwareEventDispatcher::dispatch

34

0.1%

193,842

75.3%

230

0.1%

176,492

75.5%

call_user_func

46

0.1%

181,808

70.7%

241

0.1%

163,983

70.1%

Symfony\Component\EventDispatcher\EventDispatcher::dispatch

34

0.1%

178,259

69.3%

360

0.1%

161,182

68.9%

Symfony\Component\EventDispatcher\EventDispatcher::doDispatch

19

0.0%

177,054

68.8%

238

0.1%

159,968

68.4%

Drupal\Core\EventSubscriber\ViewSubscriber::onView

1

0.0%

133,762

52.0%

30

0.0%

124,379

53.2%

Drupal\Core\EventSubscriber\ViewSubscriber::onHtml

1

0.0%

133,667

52.0%

21

0.0%

124,283

53.1%

drupal_render_page

1

0.0%

133,135

51.7%

130

0.1%

123,751

52.9%

call_user_func_array

81

0.2%

72,108

28.0%

406

0.2%

65,780

28.1%

drupal_render

2

0.0%

67,728

26.3%

55

0.0%

62,663

26.8%

theme

4

0.0%

67,318

26.2%

332

0.1%

62,258

26.6%

block_page_build

1

0.0%

64,377

25.0%

63

0.0%

59,428

25.4%

block_get_blocks_by_region

17

0.0%

63,782

24.8%

56

0.0%

58,844

25.2%

Symfony\Component\ClassLoader\ClassLoader::loadClass

242

0.5%

59,264

23.0%

2,878

1.1%

59,593

25.5%

_block_get_renderable_region

4

0.0%

51,872

20.2%

190

0.1%

48,477

20.7%

entity_view

6

0.0%

47,140

18.3%

80

0.0%

43,696

18.7%

Note

This profiling was done with a Drupal 8.x alpha relese; exact profiling results will differ for later releases.

Looking down the list, we can see that most of the functions are called only one or two times. While no guarantee, this usually suggests that they’re high-level functions that have a deep call stack, and may not be easy to optimize or may need caching added. However, also in the list is Symfony\Component\ClassLoader\ClassLoader::loadClass, called 242 times and taking 59 ms. Clicking on the function name will show all the parent functions/callers of this function, as well as all the child functions/callees.

Drupal 8 (as of Alpha 2) provides two options for class loading: Symfony\Component\ClassLoader\ClassLoader and Symfony\Component\ClassLoader\ApcClassLoader.

Let’s see what happens when we swap in ApcClassLoader and compare the results. Open up setting.php and uncomment the following line:

<?php
$settings['class_loader'] = 'apc';
?>

Then visit node/1 again (twice, to allow the APC cache to be populated).

Searching the page for “loadClass” reveals that it’s dropped down nearly to the bottom of the top 100 functions, nested between node_access() and _menu_translate(), and has taken only 21 ms this time (as illustrated in Table 6-5). That’s nearly a two-thirds reduction in the time spent on this function.

Table 6-5. Results for the loadClass() method with the APC class loader enabled

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl.MemUse (bytes)IMemUse%

node_access

7

0.0%

21,506

10.0%

107

0.0%

20,736

10.9%

Symfony\Component\ClassLoader\ApcClassLoader::loadClass

242

0.6%

21,443

9.9%

2,957

1.4%

21,845

11.4%

_menu_translate

6

0.0%

21,318

9.9%

218

0.1%

18,549

9.7%

To see where the remaining time is spent, click on loadClass and scroll down to the “Child functions” section. Table 6-6 shows the top few.

Table 6-6. loadClass() child functions

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl. MemUse (bytes)IMemUse%

Symfony\Component\ClassLoader\ApcClassLoader::findFile

242

33.8%

2,026

9.4%

2,253

10.3%

54,312

0.7%

run_init::Entity/Role.php

1

0.1%

562

2.6%

563

2.6%

263,016

3.5%

run_init::mysql/Select.php

1

0.1%

411

1.9%

412

1.9%

193,624

2.6%

run_init::Entity/EntityDisplay.php

1

0.1%

337

1.6%

339

1.6%

176,624

2.4%

run_init::Entity/Node.php

1

0.1%

317

1.5%

320

1.5%

185,192

2.5%

run_init::Field/Field.php

1

0.1%

250

1.2%

251

1.1%

90,368

1.2%

run_init::user/RoleStorageController.php

1

0.1%

242

1.1%

243

1.1%

97,360

1.3%

run_init::formatter/TextDefaultFormatter.php

1

0.1%

229

1.1%

231

1.1%

72,272

1.0%

run_init::Entity/EntityManager.php

1

0.1%

222

1.0%

223

1.0%

51,416

0.7%

The top function is findFile(), also with 242 calls and taking 2 ms. findFile() looks like this in the code base:

<?php
   /**
     * Finds a file by class name while caching lookups to APC.
     *
     * @param string $class A class name to resolve to file
     *
     * @return string|null
     */
    public function findFile($class)
    {
        if (false === $file = apc_fetch($this->prefix.$class)) {
            apc_store($this->prefix.$class,
                                  $file = $this->decorated->findFile($class));
        }

        return $file;
    }
?>

Clicking on findFile() confirms that it only calls apc_fetch(). Comparing before/after, in Table 6-7 we can see that there were 429 calls to file_exists(), 12,164 to strpos(), 838 to str_replace(), 834 to substr(), and 428 to strrpos() using ClassLoader, all of which have been replaced with 428 calls to apc_fetch() with the ApcClassLoader, with a reduction of 41 ms down to 3 ms (see Table 6-8).

Table 6-7. findFile() details with ClassLoader

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl. MemUse (bytes)IMemUse%

Symfony\Component\ClassLoader\ClassLoader::findFile

428

2.8%

41,274

16.0%

41,662

17.8%

108,000

0.5%

Exclusive Metrics for Current Function

35,456

85.9%

22,436

53.9%

-83,256

-77.1%

Child functions

file_exists

429

2.9%

2,571

6.2%

2,918

7.0%

1,128

1.0%

strpos

12,164

82.9%

1,448

3.5%

13,131

31.5%

1,120

1.0%

str_replace

838

5.7%

918

2.2%

1,378

3.3%

126,648

117.3%

substr

820

5.6%

506

1.2%

1,162

2.8%

61,240

56.7%

strrpos

428

2.9%

375

0.9%

637

1.5%

1,120

1.0%

Table 6-8. findFile() details with ApcClassLoader

Function NameCallsCalls%Incl. Wall Time (microsec)IWall%Incl. CPU (microsecs)ICpu%Incl. MemUse (bytes)IMemUse%

Symfony\Component\ClassLoader\ApcClassLoader::findFile

428

49.6%

3,164

1.5%

3,572

1.9%

100,936

0.5%

Exclusive Metrics for Current Function

1,908

60.3%

1,965

55.0%

41,776

41.4%

Child function

apc_fetch

428

100.0%

1,256

39.7%

1,607

45.0%

59,160

58.6%

The remaining records are all for file execution (run_init), which can’t be optimized within the application itself.

strace

Sometimes it’s necessary to go to the system level to understand what’s happening with your application. This is particularly true for system calls made by PHP, which won’t show up in Xdebug or XHProf at all. strace is a great tool for looking at what happens here, and we’ll use it as an example to examine how Drupal includes module files.

Note

In Drupal 7, file existence is checked before the module is loaded in drupal_load(), via a file_exists() check in drupal_get_filename(). This avoids errors being thrown when a module is moved in the filesystem without the module list being updated in the database. Drupal 8’s \Drupal\Core\Extension\ModuleHandlerInterface::load() assumes that the module list is correct, and attempts to load the file with include_once() regardless (a PHP warning will be triggered if the file doesn’t exist, but this is an error condition so is considered valid).

PHP’s documentation states that file_exists() is cached when the file is present on the file system, but not cached when it’s missing. A file_exists() will show up in XHProf whether it caches internally or not, so strace is an ideal tool to test the behavior at a lower level.

First, we need to attach strace to Apache. strace can be attached to an individual process ID or to Apache itself when it’s started. On a local machine the only traffic should be your own testing, and restarting Apache isn’t going to cause a site outage, so this is the simplest mechanism to get strace output:

$ sudo /etc/init.d/apache2 stop
$ sudo strace -f -o /tmp/strace.txt /etc/init.d/apache2 start

Creating the following test file in the root of a Drupal 8 directory covers the three variations:

<?php
define('DRUPAL_ROOT', getcwd());

// Just include_once
include_once DRUPAL_ROOT . '/core/modules/system/system.module';

// A file_exists() where the file is present on the file system:
if (file_exists(DRUPAL_ROOT . '/core/modules/tracker/tracker.module')) {
  include_once DRUPAL_ROOT . '/core/modules/tracker/tracker.module';
}

// A file exists where the file is not present on the file system:
file_exists(DRUPAL_ROOT . '/core/modules/foo/foo.module');
?>

Once Apache is restarted, visiting the page in the browser should lead to some strace output in /tmp/strace.txt, which you can then open up in your browser.

Let’s first of all focus on the request to system.module via include_once(), with no file_exists() check. Since this is the first request to Apache after a restart, the realpath cache is empty, meaning there are some additional lstat calls to be made:

26378 lstat("/var/www/8/core/modules/system/system.module",
            {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26378 lstat("/var/www/8/core/modules/system",
            {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
26378 lstat("/var/www/8/core/modules",
            {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
26378 open("/var/www/8/core/modules/system/system.module", O_RDONLY) = 12
26378 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26378 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26378 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26378 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26378 mmap(NULL, 12685, PROT_READ, MAP_SHARED, 12, 0) = 0x7f31caf9e000
26378 stat("/var/www/8/core/modules/system/system.module",
           {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0

Hitting the page again, those lstat calls disappear:

26606 open("/var/www/8/core/modules/system/system.module", O_RDONLY) = 12
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 mmap(NULL, 12685, PROT_READ, MAP_SHARED, 12, 0) = 0x7fc6b917d000
26606 stat("/var/www/8/core/modules/system/system.module",
           {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0

With the realpath cache warmed, let’s take a look at the existing tracker.module file wrapped in a file_exists():

26606 access("/var/www/8/core/modules/tracker/tracker.module", F_OK) = 0
26606 open("/var/www/8/core/modules/tracker/tracker.module", O_RDONLY) = 12
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 mmap(NULL, 12685, PROT_READ, MAP_SHARED, 12, 0) = 0x7fc6b917d000
26606 stat("/var/www/8/core/modules/tracker/tracker.module",
           {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0

The file_exists() is represented by this line:

26606 access("/var/www/8/core/modules/tracker/tracker.module", F_OK) = 0

Remember, the PHP documentation for file_exists() states that the results of file_exists() are cached in the stat cache when the file is present on the file system. What it doesn’t mention is that the stat cache is per-request, meaning each request to Drupal starts with an empty cache. This differs from the realpath cache, which is per-process.

When the file_exists() fails, no inclusion happens, so the only line in the strace output is the access call. Note that the return value changes from 0 to -1 ENOENT (No such file or directory):

26378 access("/var/www/8/core/modules/foo/foo.module", F_OK) =
                                           -1 ENOENT (No such file or directory)

As a further example, enable apc.stat = 0 in php.ini, then test the include_once() again.

Without apc.stat = 0, notice the stat call at the end:

26606 open("/var/www/8/core/modules/system/system.module", O_RDONLY) = 12
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 mmap(NULL, 12685, PROT_READ, MAP_SHARED, 12, 0) = 0x7fc6b917d000
26606 stat("/var/www/8/core/modules/system/system.module",
           {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0

With apc.stat = 0, that stat call disappears, as you’d expect:

26606 open("/var/www/8/core/modules/system/system.module", O_RDONLY) = 12
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 fstat(12, {st_mode=S_IFREG|0664, st_size=12685, ...}) = 0
26606 mmap(NULL, 12685, PROT_READ, MAP_SHARED, 12, 0) = 0x7fc6b917d000

Note that apc.stat has some administrative overhead—you’ll need to be able to clear the APC cache when code is deployed, for example. See Chapter 18 for more information on opcode caches.

Use of the tools in this chapter allows you to see exactly what’s happening when code is executed, and therefore to verify accurately whether changes made for performance purposes are successfully reducing the amount of work being done at different levels of the application. While we cover load testing in Chapter 20, load testing tools should almost never be used for before/after performance testing of specific changes, since they’re unable to provide the accuracy, detail, or introspection necessary to verify the impact of what are often quite small individual changes.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required