So you’re ready to jump in and start improving your website’s performance. This can be a daunting task. There are so many services, underlying technologies, and possible problems that it can be difficult to pick a starting point. It is easy to run around in circles, checking and fixing many small issues but never addressing your major problems (or even discovering what they are). Knowing where to start and which issues are of high priority can be one of the most difficult parts of optimizing a site.
Due to these common issues of discovery and prioritization, good performance engineers and system administrators tend to do a lot more gathering of metrics and statistics than most people think. A complete understanding of the problem points of a website (problem pages, blocks, or views) and server metrics during average- and high-load situations is a requirement for making good decisions. Whenever we approach a new infrastructure or website project, the investigation and metrics collection period is often the most important time and will determine how effective the entire optimization project is.
We will discuss tools and methodologies for collecting performance information in later chapters. For now, let us assume we have a spreadsheet of problem pages or requests and some server information (CPU, load, I/O usage, etc.) during some peak load periods. The next important step in optimizing a site is defining goals and usage patterns. This is important for similar reasons to having accurate metrics: it prevents you from endlessly fixing issues that may be legitimate, but are not the problems preventing the site from meeting its goals. For example, if you have a site that needs to serve 10,000 pages a day to only anonymous users, you can review all of the Views for this site and ensure they are all performing well, but it would be a waste of time when you could get better performance faster by ensuring the page cache is working effectively.
Everything we have discussed so far is considered quite pedantic and seems to be little more than bookkeeping. As technical people, we like to walk into a bad situation, immediately pinpoint the problem, and fix it in a few minutes. It’s nice when this works, but often it fails or succeeds only partially, or worse, temporarily. The methodology we are proposing of performing a robust discovery phase and having a lot of quality information (metrics, expected site usage, and goals) for the site is much better for both the long-term sustainability of the site and your own longer-term sanity. You cannot always immediately pinpoint the problem, but a method based on information and metrics is always going to be effective.
There are a number of approaches that can be used to collect this information and develop a performance plan. However, we typically follow a straightforward approach that attempts to focus on low-hanging fruit and the real site problems. We also tend to focus on iteration, as often when you solve one large problem, it uncovers other issues that used to be hidden.
Let’s outline the steps involved in this process—we will go into more detail on each step later in this chapter:
- Measure and record the current site performance. This is your “performance baseline,” which will be used to analyze potential performance improvements. Document any known issues with the site, such as individual or groups of pages that are consistently slow, servers that are always under high load, or anything else that might have an effect on performance or scaling. We will go into the tools and methods for doing this in later chapters, as its a very broad topic and can be a somewhat nebulous task.
- Define goals and requirements for the site. For example, “The front page must load in under two seconds for anonymous traffic,” and “A site search must not take more than three seconds on average to return results.” The “must” and “should” wording in these statements is important, as it separates requirements (“must”) and goals (“should”)—more on this in the next section.
- Actually perform your review. This often involves running a load test, reviewing configuration files, profiling pages, and reviewing slow query logs. Many engineers consider this the only step, but the problem with such an approach is that it lacks baseline information and a structured list of goals, as defined in the previous two steps. There will be many chapters in this book on the various topics that this step encompasses.
Define a list of potential improvements based on the site goals and requirements, using the information gathered in the performance baseline and your review. The list should be prioritized based on a few factors:
- Does the item contribute to achieving a requirement or goal for the website?
- What is the expected benefit of the change?
- What is the cost of the improvement, both in terms of staff time and any hardware or software purchases that may be necessary for the change?
- Once an improvement has been made, what impact does it have?
If you are working for a client, step 4 is particularly important. However, even if you are working for yourself or for your company, it’s incredibly important to develop a list of potential improvements and ensure they are both prioritized and tracked for effectiveness. Returning to a site two or three weeks later without a good record of what was done previously and the impact of those changes will make your job much more difficult.
As to the prioritization of fixes, there is no hard and fast rule, but a good approach is to work on items that will give you the most bang for your buck—that is, those fixes that either don’t take much effort compared to their impact or provide a vast improvement.
Measuring current website performance will give you a baseline that you can compare to the performance after making a change. Knowing how the site was performing initially makes it easy to tell whether changes have had the expected effect, or when they resulted in only a minor improvement—or worse, decreased performance! Depending on your needs, determining the performance baseline could be as simple as tracking full page load times for a selection of pages on your site, or as intricate as tracking memory and CPU usage for key functions used to display one or more pages on your site. What’s important here is that you decide what measurements are important to you. If this is a first pass at improving the performance of a site, generally it will be sufficient to choose one or two pages of each type that you have on your site (e.g. “article category display,” “article,” “author bio,” “forum overview page,” “forum topic page”). For each of those types, you’ll want to measure some predefined set of data—what data is tracked will vary based on your needs, but if you’re looking simply to improve page load time, there are a few data points that can be focused on to start:
- Time to first byte
- Time for a full page load
- This is how long it takes for an entire page to be loaded in a user’s browser window.
- Frontend display times
Before fully understanding the performance implications, many people assume that the full page load time will be the most important factor in the site feeling fast to a user. In fact, the time to first byte can be much more important (there are exceptions, of course), because it’s at that point that the user’s browser starts working on displaying the data sent from your site. That’s not to say you should focus entirely on the time to first byte, though it’s quite important to at the very least look at both of these measurements.
Once you have a good understanding of the website’s baseline performance and have started to track down some of the current bottlenecks, it will be possible to start setting some well-defined and attainable performance goals for the site. Setting realistic goals is important for a number of reasons:
- Performance improvements on a website are a continual process. Setting concrete goals allows for work to be split up incrementally.
- Defining a set of goals with site developers can help prevent the addition of features that may be “nice to have” but have a serious adverse affect on performance. If goals have been well defined and have buy-in from all involved parties, they can be referred to later as a reason why or why not to implement certain features and technologies on the site.
- If goals are arbitrarily set without knowing the current performance of the site or the actual near-term requirements, you may set yourself up to fail with goals that are impossible to achieve with the resources you have at your disposal. Always focus on reality, not what you would like reality to be.
Potential improvements could include (but are not limited to) the following:
- Reducing average page load time
- This could be set as an overall goal for the site, and also more specifically for certain page types or common entry points into the site (the front page, marketing landing pages, etc.). Example goals: “Decrease the average page load time for all pages across the site from five seconds to three seconds. Average page load time for the front page should be under two seconds.”
- Decreasing maximum page load time
- Again, this goal could be set overall for the site as well as for specific pages or page types. Example goals: “The maximum page load time across the entire site should always remain below eight seconds. Article pages should have a maximum page load time of five seconds. The front page of the site should have a maximum page load time of three seconds.”
- Improving page load times for first-time visitors
Once you’ve created a list of performance goals for the site, you can start to look at specific tasks that will help you to achieve those goals and problems with the current site preventing you from reaching those goals. Much of the rest of this book is dedicated to giving specific examples of common slow points on Drupal websites and ways to improve performance for those specific issues. As you start to dive in to make adjustments to the site, always keep an eye on the goals and requirements that you have developed. As you work, some of the goals may need to be adjusted because they were either too optimistic or perhaps didn’t take into account certain aspects of your site or infrastructure that you are unable to change.
- Frontend performance: page rendering time in a site visitor’s browser
- PHP execution time on the web server
- Time spent for the web server to serve a request
- Time spent fetching and storing items in the cache
- Database query execution time
- Network traffic for each link along the path of a request: user→web server→cache server→database server, etcetera
- External requests, either server-side or client-side—for example, code that calls an external API (think Twitter, Facebook, etc.) or pulls in external files or images
All of these items contribute to the big picture of how a website performs. By breaking down requests and analyzing the performance of each of these various pieces, we can isolate the worst-performing parts of the site and focus our improvement efforts on those in order to get the most benefit from our work. In addition, understanding where the performance bottlenecks are can save you from blindly working on general performance improvements that may not have much effect on the overall performance of the site.
For example, consider a page that takes five seconds to deliver the first byte of data to a client browser. Let’s say that one second of that is spent on the web server serving the request and executing PHP, 3.75 seconds are spent on database queries for the page, and 0.25 seconds are spent pulling items from cache storage. Now, it’s pretty clear that there is not much benefit to be had by working on the caching layer. The best place to start performance work in this case would be to look at the queries that are being run on the database to figure out which of them are slow—we may be able to improve the query speed by changing the logic, or figure out a way to better cache the query results to avoid running queries repeatedly. Had we not broken down the different components, we could have wasted a lot of time trying to improve PHP execution time or trying to increase the speed of cached requests when those are not likely to give us much overall improvement in the performance of the page.
We’ll get into more specifics on how to measure and analyze performance for various aspects of a site later. It is a complicated topic, and one that much of this book is devoted to. For now, it’s just important to understand that there are multiple pieces contributing to overall page load performance. Understanding where the bottlenecks are makes it possible to focus performance improvements on areas that will have the greatest effect on the overall page load time.
During a performance review or site analysis, it is important to either have very detailed notes or to build your “prioritized list of improvements” during the review. As we have already explained, a single page load is a complicated matter. We are all “standing on the shoulders of giants” in the computer industry; those giants created the subsystems, drivers, architectures, services, caching daemons, httpd daemons, and opcode caches we rely on, and even Drupal itself. Although many were not particularly tall (Dries—the founder of Drupal pictured in Figure 1-1—is a notable exception), they are not called giants for nothing—each layer is immensely complex, some more so than others.
Due to this complexity, if you don’t consistently keep priority in mind and look for the “low-hanging fruit”, it is very easy to lose your way or forget something you’ve found. Perhaps while instrumenting the Apache process of your site, you noted that too many directory lookups are happening. However, if you have SQL queries on your home page that are taking five to six seconds to execute, are Apache’s foibles your highest priority? For every performance engineer solving client-facing problems, there is at least one other optimizing something entirely pointless.
Not only does keeping a priority list or priority-driven notes force you to focus on real problems, but it also allows you to cross reference and remember what you’ve seen. The issues you observe in different subsystems may be related, but it can be hard to draw the correct correlations without the issues noted down. Not everyone can connect the dots entirely in his head.
Once you have your list of issues, you can review them and prioritize them based on what you believe are the most important issues, or the ones that will be very easy to solve relative to their impact. Because you started building them during the review, most of the list items should be fairly detailed and actionable.