So you’re ready to jump in and start improving your website’s performance. This can be a daunting task. There are so many services, underlying technologies, and possible problems that it can be difficult to pick a starting point. It is easy to run around in circles, checking and fixing many small issues but never addressing your major problems (or even discovering what they are). Knowing where to start and which issues are of high priority can be one of the most difficult parts of optimizing a site.
Due to these common issues of discovery and prioritization, good performance engineers and system administrators tend to do a lot more gathering of metrics and statistics than most people think. A complete understanding of the problem points of a website (problem pages, blocks, or views) and server metrics during average- and high-load situations is a requirement for making good decisions. Whenever we approach a new infrastructure or website project, the investigation and metrics collection period is often the most important time and will determine how effective the entire optimization project is.
We will discuss tools and methodologies for collecting performance information in later chapters. For now, let us assume we have a spreadsheet of problem pages or requests and some server information (CPU, load, I/O usage, etc.) during some peak load periods. The next important step in optimizing a site is defining goals and usage patterns. This is important for similar reasons to having accurate metrics: it prevents you from endlessly fixing issues that may be legitimate, but are not the problems preventing the site from meeting its goals. For example, if you have a site that needs to serve 10,000 pages a day to only anonymous users, you can review all of the Views for this site and ensure they are all performing well, but it would be a waste of time when you could get better performance faster by ensuring the page cache is working effectively.
Everything we have discussed so far is considered quite pedantic and seems to be little more than bookkeeping. As technical people, we like to walk into a bad situation, immediately pinpoint the problem, and fix it in a few minutes. It’s nice when this works, but often it fails or succeeds only partially, or worse, temporarily. The methodology we are proposing of performing a robust discovery phase and having a lot of quality information (metrics, expected site usage, and goals) for the site is much better for both the long-term sustainability of the site and your own longer-term sanity. You cannot always immediately pinpoint the problem, but a method based on information and metrics is always going to be effective.
There are a number of approaches that can be used to collect this information and develop a performance plan. However, we typically follow a straightforward approach that attempts to focus on low-hanging fruit and the real site problems. We also tend to focus on iteration, as often when you solve one large problem, it uncovers other issues that used to be hidden.
Let’s outline the steps involved in this process—we will go into more detail on each step later in this chapter:
Define a list of potential improvements based on the site goals and requirements, using the information gathered in the performance baseline and your review. The list should be prioritized based on a few factors:
If you are working for a client, step 4 is particularly important. However, even if you are working for yourself or for your company, it’s incredibly important to develop a list of potential improvements and ensure they are both prioritized and tracked for effectiveness. Returning to a site two or three weeks later without a good record of what was done previously and the impact of those changes will make your job much more difficult.
As to the prioritization of fixes, there is no hard and fast rule, but a good approach is to work on items that will give you the most bang for your buck—that is, those fixes that either don’t take much effort compared to their impact or provide a vast improvement.
Measuring current website performance will give you a baseline that you can compare to the performance after making a change. Knowing how the site was performing initially makes it easy to tell whether changes have had the expected effect, or when they resulted in only a minor improvement—or worse, decreased performance! Depending on your needs, determining the performance baseline could be as simple as tracking full page load times for a selection of pages on your site, or as intricate as tracking memory and CPU usage for key functions used to display one or more pages on your site. What’s important here is that you decide what measurements are important to you. If this is a first pass at improving the performance of a site, generally it will be sufficient to choose one or two pages of each type that you have on your site (e.g. “article category display,” “article,” “author bio,” “forum overview page,” “forum topic page”). For each of those types, you’ll want to measure some predefined set of data—what data is tracked will vary based on your needs, but if you’re looking simply to improve page load time, there are a few data points that can be focused on to start:
Before fully understanding the performance implications, many people assume that the full page load time will be the most important factor in the site feeling fast to a user. In fact, the time to first byte can be much more important (there are exceptions, of course), because it’s at that point that the user’s browser starts working on displaying the data sent from your site. That’s not to say you should focus entirely on the time to first byte, though it’s quite important to at the very least look at both of these measurements.
Once you have a good understanding of the website’s baseline performance and have started to track down some of the current bottlenecks, it will be possible to start setting some well-defined and attainable performance goals for the site. Setting realistic goals is important for a number of reasons:
Potential improvements could include (but are not limited to) the following:
Once you’ve created a list of performance goals for the site, you can start to look at specific tasks that will help you to achieve those goals and problems with the current site preventing you from reaching those goals. Much of the rest of this book is dedicated to giving specific examples of common slow points on Drupal websites and ways to improve performance for those specific issues. As you start to dive in to make adjustments to the site, always keep an eye on the goals and requirements that you have developed. As you work, some of the goals may need to be adjusted because they were either too optimistic or perhaps didn’t take into account certain aspects of your site or infrastructure that you are unable to change.
All of these items contribute to the big picture of how a website performs. By breaking down requests and analyzing the performance of each of these various pieces, we can isolate the worst-performing parts of the site and focus our improvement efforts on those in order to get the most benefit from our work. In addition, understanding where the performance bottlenecks are can save you from blindly working on general performance improvements that may not have much effect on the overall performance of the site.
For example, consider a page that takes five seconds to deliver the first byte of data to a client browser. Let’s say that one second of that is spent on the web server serving the request and executing PHP, 3.75 seconds are spent on database queries for the page, and 0.25 seconds are spent pulling items from cache storage. Now, it’s pretty clear that there is not much benefit to be had by working on the caching layer. The best place to start performance work in this case would be to look at the queries that are being run on the database to figure out which of them are slow—we may be able to improve the query speed by changing the logic, or figure out a way to better cache the query results to avoid running queries repeatedly. Had we not broken down the different components, we could have wasted a lot of time trying to improve PHP execution time or trying to increase the speed of cached requests when those are not likely to give us much overall improvement in the performance of the page.
We’ll get into more specifics on how to measure and analyze performance for various aspects of a site later. It is a complicated topic, and one that much of this book is devoted to. For now, it’s just important to understand that there are multiple pieces contributing to overall page load performance. Understanding where the bottlenecks are makes it possible to focus performance improvements on areas that will have the greatest effect on the overall page load time.
During a performance review or site analysis, it is important to either have very detailed notes or to build your “prioritized list of improvements” during the review. As we have already explained, a single page load is a complicated matter. We are all “standing on the shoulders of giants” in the computer industry; those giants created the subsystems, drivers, architectures, services, caching daemons, httpd daemons, and opcode caches we rely on, and even Drupal itself. Although many were not particularly tall (Dries—the founder of Drupal pictured in Figure 1-1—is a notable exception), they are not called giants for nothing—each layer is immensely complex, some more so than others.
Due to this complexity, if you don’t consistently keep priority in mind and look for the “low-hanging fruit”, it is very easy to lose your way or forget something you’ve found. Perhaps while instrumenting the Apache process of your site, you noted that too many directory lookups are happening. However, if you have SQL queries on your home page that are taking five to six seconds to execute, are Apache’s foibles your highest priority? For every performance engineer solving client-facing problems, there is at least one other optimizing something entirely pointless.
Not only does keeping a priority list or priority-driven notes force you to focus on real problems, but it also allows you to cross reference and remember what you’ve seen. The issues you observe in different subsystems may be related, but it can be hard to draw the correct correlations without the issues noted down. Not everyone can connect the dots entirely in his head.
Once you have your list of issues, you can review them and prioritize them based on what you believe are the most important issues, or the ones that will be very easy to solve relative to their impact. Because you started building them during the review, most of the list items should be fairly detailed and actionable.