Concerns about web performance aren’t new. Let’s jump into the Wayback Machine and make a quick visit to 1999, when Zona released a report1 warning online retailers that they risked losing $4.35 billion per year if they didn’t optimize their websites’ loading times. (Interesting aside: Zona recommended that the optimal loading time for ecommerce sites was 8 seconds. Times sure have changed. Today, we know that for most ecommerce sites, the performance sweet spot is around 2.5 seconds. After 4 seconds, conversion rates dip sharply.)
Over the years, solutions have become increasingly refined, from building out the bulk of the infrastructure of the Internet to homing in on performance issues within the browser. This chapter offers a quick fly-over tour—designed expressly for nongeeks—of the acceleration and optimization landscape over the past 20 or so years.
The goal here is to show you how solutions have evolved from relatively simple (i.e., throw more servers at the problem) to increasingly complex and nuanced as we’ve learned more about the root causes of performance issues and how to address them.
Before getting started with talking about solutions, let’s first talk about two of the biggest problems faced by anyone who cares about making the Internet faster:
Before we start talking about solutions, it’s helpful to identify some of the main problems. If you ask anyone in the performance industry to name the biggest obstacles to delivering faster user experiences, “latency” would probably be one of their top three answers.
Let’s put this in real-world terms. Say you visit a web page and that page contains 100 resources. Your browser has to make 100 individual requests to the site’s host server (or more likely, multiple servers) in order to pull those objects. Each of those requests experiences 75–140 milliseconds of latency. This may not sound like much, but it adds up fast. When you consider that a page can easily contain 300 or more resources, and that latency can reach a full second for some mobile users, you can see where latency becomes a major performance problem.
One of the big problems with latency is that it’s unpredictable and inconsistent. It can be affected by factors ranging from the weather to what your neighbors are downloading.
Tackling latency is a top priority for the performance industry. There are several ways to do this:
While latency isn’t the only performance challenge, it’s a major one. Throughout the rest of this chapter, I’ll explain how various performance-boosting technologies address latency (and other issues).
We’ve come a long way since 1993, when the World Wide Web was introduced. Back then, we were too busy being excited and amazed about the simple fact that the Internet existed to complain about the slowness of the network. Not that faster networks would have made much of a difference to how we experienced the Web. For most of us, dial-up Internet access was limited to 56 Kbps modems connecting via phone lines.
Not surprisingly, even the minimalist web pages of those times—when the average page was around 14 KB in size and contained just two resources—could take a while to load. (True story: I have a friend who taught herself to play the guitar while waiting for pages to load in her early months of using the Web.)
Today, you’d be forgiven for believing that, between our faster networks and superior connectivity, we’ve fully mitigated our early performance problems—and not a moment too soon. Modern web pages can easily reach 3 or 4 MB in size. When I hear people rationalize why this kind of page bloat isn’t a serious performance issue, one of the most common arguments that comes up is the belief that our ever-evolving networks mitigate the impact.
While yes, it’s true that networks and connectivity have improved, there are some misconceptions about what those improvements mean in real-world usage. To illustrate, let’s consider the results of a set of performance tests of the Etsy.com home page, using WebPagetest.org, a synthetic performance measurement tool that simulates different realistic connection speeds and latencies. (If you want to jump straight to the key findings, skip ahead to the end of this section.)
The page was tested across five different desktop and mobile connection types (RTT stands for round-trip time—that is, the amount of time it takes for the host server to receive, process, and deliver on a request for a page resource such as images, CSS files, and so on; “latency” is another word for the delay in RTT):
Taking a glance at Figure 4-1, you can see that the light blue bars representing load times are not nearly as dramatically stacked as the darker blue bars that indicate bandwidth numbers.
Figure 4-2 is another way of looking at these numbers. If people’s supposition that bandwidth improvements correlate to proportionately faster load times was correct, then the two sides of this second graph would mirror each other. Clearly they do not.
A few observations:
In 2015, the Federal Communications Commission updated its definition of broadband from 4 Mbps to 25 Mbps. According to this new definition, roughly one out of five Internet users—approximately 50 million people—in the United States suddenly did not have broadband access.
Whether you’re a site owner, developer, designer, or any other member of the Internet-using population, chances are you fall into the general category of urban broadband user. And there’s also a chance that you believe your own speedy user experience is typical of all users. This isn’t the case.
For much of the history of the Internet, performance problems have been blamed on servers. “Server overload” was commonly cited as the culprit behind everything from sluggish response times to poor page rendering. So the catch-all cure for performance pains emerged: throw more servers at the problem.
The myth that server load was the cause of most performance issues began to be put to rest in 2007, when Steve Souders’s book High Performance Web Sites (O’Reilly)—which remains the bible for frontend developers and performance engineers—was released. In 2007, Steve famously said:
80%–90% of the end user response time is spent on the frontend. Start there.
This finding has proven consistent over the years. For the majority of sites, only 10%–20% of response time happens at the backend. To illustrate, Figure 4-4 shows the proportion of backend time (in blue) compared to frontend time (in green). In this specific instance, 86% of response time happened at the frontend.
The takeaway from this: yes, you do need to ensure that your servers are up to the task of hosting your site and meeting traffic demands, but chances are you’re already covered in this area.
A major advance occurred in the late 1990s, when server engineers recognized that, just as routing enabled more wires to carry more messages, load balancing could enable more servers to handle more requests. Individual web servers gave way to server farms and datacenters. These server farms equipped themselves with load balancers—technology that (as its name suggests) balances traffic load across multiple servers, preventing overload caused by traffic surges.
By 2007, load balancers had evolved into sophisticated application delivery controllers (ADCs). I should mention here that the issue of whether ADCs evolved from load balancers is the subject of some hairsplitting, with many folks arguing that ADCs are much more than just advanced load balancers. But for our purposes here, it’s sufficient to know that the two technologies are frequently connected, even if they’re evolutionarily light years apart.
In addition to simple load balancing, modern ADCs optimize database queries, helping speed up the dynamic construction of pages using stored data. ADCs also monitor server health, implement advanced routing strategies, and offload server tasks such as SSL termination and TCP connection management.
As websites continued to compete for attention, page content shifted from being mostly styled text to containing a huge variety of images and other media. Support for stylesheets and client-side scripts further added to the number of objects that browsers needed to fetch in order to render each page. This multiplying of requests per page also multiplied the impact of network latency, which led to the next big innovation: content delivery networks.
While CDNs also solve performance-related problems such as improving global availability and reducing bandwidth, the main problem they address is latency: the amount of time it takes for the host server to receive, process, and deliver on a request for a page resource (images, CSS files, etc.). Latency depends largely on how far away the user is from the server, and it’s compounded by the number of resources a web page contains.
For example, if all your resources are hosted in a server farm somewhere in Iowa, and a user is coming to your page from Berlin, then each request has to make a long round-trip from Berlin to Iowa and back to Berlin. If your web page contains 100 resources (which is at the low end of normal), then your visitor’s browser has to make 100 individual requests to your server in order to retrieve those objects.
A CDN caches static resources in distributed servers (AKA edge caches, points of presence, or PoPs) throughout a region or worldwide, thereby bringing resources closer to users and reducing round-trip time (Figure 4-5).
Like any technology, CDNs have evolved over the years. First-generation CDNs, which were introduced in the late 1990s, focused simply on caching page resources. More recent iterations allow you to cache dynamic content and even develop on the edge.
While using a CDN is a must for many sites, it’s not necessary for every site. For example, if you’re hosting locally and if your users are also primarily local, a CDN won’t make you much faster (though it can still help lighten your bandwidth bill).
A CDN is not a standalone performance solution. There are a number of performance pains a CDN can’t cure, such as:
CDNs address the performance middle mile by bringing resources closer to users—shortening server round-trips, and as a result, making pages load faster. Frontend optimization (FEO) tackles performance at the frontend so that pages render more efficiently in the browser.
Frontend optimization addresses performance at the browser level, and has emerged in recent years as an extremely effective way to supplement server build-out and CDN services. One way that FEO alleviates latency is by consolidating page objects into bundles. Fewer bundles means fewer trips to the server, so the total latency hit is greatly reduced. FEO also leverages the browser cache and allows it to do a better job of storing files and serving them again where relevant, so that the browser doesn’t have to make repeat calls to the server.
The four main FEO strategies for improving performance are:
Reduce the number of HTTP requests required to fetch the resources for each page (by consolidating resources).
Reduce the size of the payload needed to fulfill each request (by compressing resources).
Optimize client-side processing priorities and improve script execution efficiency (by ensuring that critical page resources load first, and deferring noncritical resources).
Target the specific capabilities of the client browser making each request (such as by leveraging the unique caching capabilities of each browser).
All of these strategies require changes to the HTML of the web page and changes to the objects being fetched by the page.
Steve Souders brought attention to frontend optimization with his book High Performance Web Sites. At the time, the only way to optimize your pages was by hand, via highly talented developers. Over the years, FEO has evolved into a highly sophisticated set of practices, some of which can only be performed by hardware- or software-based solutions.
While frontend optimization can be performed manually by developers, many site owners have turned to products and services that automate the process of page optimization. These tools implement FEO best practices by automatically modifying HTML in real time as pages are being served. Some CDNs now offer FEO as a value-added service.
Today, roughly one out of four people worldwide own a smartphone. By 2020, that number is expected to increase to four out of five (see Figure 4-6). That’s more than six billion mobile devices all connected to this massive infrastructure. Stop and think about that for a minute.
With the proliferation of mobile devices has come a host of unique issues that deeply affect mobile performance, from low-horsepower devices to network slowdowns.
One of the biggest performance issues (and one that is not unique to mobile) is page bloat. Pages served to mobile continue to balloon beyond our networks’ ability to serve them (see Figure 4-7).
Despite these constraints, user expectations continue to grow: a typical mobile user expects a site to load as fast—or faster!—on their tablet or smartphone as it does on desktop.
Our use of mobile devices is exploding. Our expectations for performance are unrelenting. And modern websites place ever-increasing levels of strain on mobile networks, devices, and browsers. Mobile is arguably the greatest battlefield for performance today.
Getting an accurate measurement for how long it took a web page to load used to be difficult and somewhat imperfect. In the olden days (you know, 2005), if you built a website, you had zero ability to look outside your own datacenters to get an understanding of performance. As websites evolved, the digital experiences we were able to serve over the Web became more and more complex. Images, video, and other rich content added more and more delay, which ultimately hurt the user’s experience.
Unfortunately, in the past, none of this could be captured with simple backend measurements. This lack of visibility into the user experience drove website owners to look at different ways of measuring performance. This yielded advanced measurement capabilities like synthetic measurement and, later, real user monitoring. Finally, with these tools, site owners could see beyond the walls of their organization to get a real sense of how their applications were performing in the wild.
Website monitoring solutions fall into two types: synthetic and real user monitoring (RUM). Each of these types offers invaluable insight into how your site performs, but neither one is a standalone. Rather, they’re highly complementary and can be used to gain a 360-degree view of performance.
Synthetic performance measurement (which you may sometimes hear called “active monitoring”) is a simulated health check of your site. You create scripts that simulate an action or path that an end user would take on your site, and those paths are monitored at set intervals.
Synthetic performance tests offer a unique set of capabilities that complement RUM extremely well. In addition to offering page-level diagnostics, synthetic tools allow you to measure a number of metrics—such as response time, load time, number of page assets, and page size—from a variety of different connection types. You can also test your site in production to find problems before the site goes live.
These are just a few of the questions that synthetic measurement can answer for you:
Synthetic measurement can tell you a great deal about how a page is constructed, but there are gaps in what it can tell you:
Real user monitoring is a form of passive monitoring that “listens” to all your traffic as users move through your site. Because RUM never sleeps, it gathers data from every user using every browser across every network, anywhere in the world. We’re talking about petabytes of data collected over billions of page views,
The word “passive" is a misnomer, because modern RUM is anything but passive. Today, the best RUM tools have powerful analytics engines that allow you to slice and dice your data in endless ways.
In addition to the usual page metrics, such as load time, real user monitoring can teach you a great deal about how people use your site, uncovering insights that would otherwise be impossible to obtain.
Here are just a few questions your RUM data can answer:
Just as synthetic measurement has its strengths and weaknesses, so does RUM:
I’d be remiss if I left this chapter without touching on browser performance. Throughout the entire history of the World Wide Web, developers have been chugging away in the background, building the most essential app of all—the browser—without which none of us would be able to access the Internet as we know it.
Being a browser developer is thankless work. When we have slow online experiences, we tend to curse the sites we visit, the networks we use, and our poor, hard-working browsers. Yet it’s arguable that browser evolution has done more than any other technology to mitigate the performance impact caused by badly designed, poorly optimized sites. (Let’s not forget that our comrades at Google were pioneering performance best practices when this issue was just a glimmer on the horizon for most of us.)
From at least the mid-2000s, browser vendors have factored performance into every new release, with the focus on performance becoming stronger with every year.
More recently, web browsers have been getting some pretty major performance upgrades, such as:
The W3C’s Web Performance Working Group has also created a substantial handful of real user monitoring specifications (e.g., navigation timing, resource timing, user timing) for in-browser performance diagnostics. This gives site owners the unprecedented ability to gather much more refined and nuanced performance measurements based on actual user behavior.
Individually, each of these enhancements solves a meaningful performance problem. Collectively, these browser enhancements have the potential to fundamentally move the needle on performance in a way that hasn’t been seen in years.
Most people have a basic understanding of Moore’s law: the observation that computer processing power doubles roughly every two years. Fewer people are familiar with Wirth’s law, which states that software is getting slower more rapidly than hardware is getting faster.
Wirth’s law was coined back in 1995, but it’s a fairly accurate summary of the web performance conundrum. Regardless of how much money we invest in building out the infrastructure of the Internet, latency will continue—well into the foreseeable future—to be one of the greatest obstacles to optimal web performance. This is due to a couple of issues:
Web pages are not likely to become smaller and less complex, so it’s fortunate that a huge—and rapidly growing—industry has grown out of the need for making pages more performant.