Chapter 4. Rule 2: Use a Content Delivery Network
The average user’s bandwidth increases every year, but a user’s proximity to your web server still has an impact on a page’s response time. Web startups often have all their servers in one location. If they survive the startup phase and build a larger audience, these companies face the reality that a single server location is no longer sufficient—it’s necessary to deploy content across multiple, geographically dispersed servers.
As a first step to implementing geographically dispersed content, don’t attempt to redesign your web application to work in a distributed architecture. Depending on the application, a redesign could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this redesign step.
The correct first step is found by recalling the Performance Golden Rule, described in Chapter 1:
Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page.
If the application web servers are closer to the user, the response time of one HTTP request is improved. On the other hand, if the component web servers are closer to the user, the response times of many HTTP requests are improved. Rather than starting with the difficult task of redesigning your application in order to disperse the application web servers, it’s better to first disperse the component web servers. This not only achieves a bigger reduction in response times, it’s also easier thanks to content delivery networks.
Content Delivery Networks
A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content to users more efficiently. This efficiency is typically discussed as a performance issue, but it can also result in cost savings. When optimizing for performance, the server selected for delivering content to a specific user is based on a measure of network proximity. For example, the CDN may choose the server with the fewest network hops or the server with the quickest response time.
Some large Internet companies own their own CDN, but it’s cost effective to use a CDN service provider. Akamai Technologies, Inc. is the industry leader. In 2005, Akamai acquired Speedera Networks, the primary low-cost alternative. Mirror Image Internet, Inc. is now the leading alternative to Akamai. Limelight Networks, Inc. is another competitor. Other providers, such as SAVVIS Inc., specialize in niche markets such as video content delivery.
Table 4-1 shows 10 top Internet sites in the U.S. and the CDN service providers they use.
You can see that:
Five use Akamai
One uses Mirror Image
One uses Limelight
One uses SAVVIS
Four either don’t use a CDN or use a homegrown CDN solution
Smaller and noncommercial web sites might not be able to afford the cost of these CDN services. There are several free CDN services available. Globule (http://www.globule.org) is an Apache module developed at Vrije Universiteit in Amsterdam. CoDeeN (http://codeen.cs.princeton.edu) was built at Princeton University on top of PlanetLab. CoralCDN (http://www.coralcdn.org) is run out of New York University. They are deployed in different ways. Some require that end users configure their browsers to use a proxy. Others require developers to change the URL of their components to use a different hostname. Be wary of any that use HTTP redirects to point users to a local server, as this slows down web pages (see Chapter 13).
In addition to improved response times, CDNs bring other benefits. Their services include backups, extended storage capacity, and caching. A CDN can also help absorb spikes in traffic, for example, during times of peak weather or financial news, or during popular sporting or entertainment events.
One drawback to relying on a CDN is that your response times can be affected by traffic from other web sites, possibly even those of your competitors. A CDN service provider typically shares its web servers across all its clients. Another drawback is the occasional inconvenience of not having direct control of the content servers. For example, modifying HTTP response headers must be done through the service provider rather than directly by your ops team. Finally, if your CDN service provider’s performance degrades, so does yours. In Table 4-1, you can see that eBay and MySpace each use two CDN service providers, a smart move if you want to hedge your bets.
CDNs are used to deliver static content, such as images, scripts, stylesheets, and Flash. Serving dynamic HTML pages involves specialized hosting requirements: database connections, state management, authentication, hardware and OS optimizations, etc. These complexities are beyond what a CDN provides. Static files, on the other hand, are easy to host and have few dependencies. That is why a CDN is easily leveraged to improve the response times for a geographically dispersed user population.
The two online examples discussed in this section demonstrate the response time improvements gained from using a CDN. Both examples include the same test components: five scripts, one stylesheet, and eight images. In the first example, these components are hosted on the Akamai Technologies CDN. In the second example, they are hosted on a single web server.
The example with components hosted on the CDN loaded 18% faster than the page with all components hosted from a single web server (1013 milliseconds versus 1232 milliseconds). I tested this over DSL (~900 Kbps) from my home in California. Your results will vary depending on your connection speed and geographic location. The single web server is located near Washington, DC. The closer you live to Washington, DC, the less of a difference you’ll see in response times in the CDN example.
If you conduct your own response time tests to gauge the benefits of using a CDN, it’s important to keep in mind that the location from which you run your test has an impact on the results. For example, based on the assumption that most web companies choose a data center close to their offices, your web client at work is probably located close to your current web servers. Thus, if you run a test from your browser at work, the response times without using a CDN are often best case. It’s important to remember that most of your users are not located that close to your web servers. To measure the true impact of switching to a CDN, you need to measure the response times from multiple geographic locations. Services such as Keynote Systems (http://www.keynote.com) and Gomez (http://www.gomez.com) are helpful for conducting such tests.
At Yahoo!, this factor threw us off for awhile. Before switching Yahoo! Shopping to Akamai, our preliminary tests were run from a lab at Yahoo! headquarters, located near a Yahoo! data center. The response time improvements gained by switching to Akamai’s CDN—as measured from that lab—were less than 5% (not very impressive). We knew the response time improvements would be better when we exposed the CDN change to our live users, spread around the world. When we exposed the change to end users, there was an overall 20% reduction in response times on the Yahoo! Shopping site, just from moving all the static components to a CDN.
Use a content delivery network.