BUY THIS BOOK
Add to Cart

Print Book $44.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £31.95

What is this?

Looking to Reprint this content?


Web Performance Tuning, Second Edition Speeding up the Web

By Patrick Killelea
Price: $44.95 USD
£31.95 GBP

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: The Quick and the Dead
While this book contains a lot of detailed information about monitoring, load testing, problem analysis, and background about how things work, I often find myself referring to this small set of questions and answers I wrote up to quickly diagnose and treat the most common problems. Since a majority of problems can be solved by simply reading through this list and checking things off, I provide it here right up front. There are many references to concepts that have not been discussed yet, but they are explained later in the book.
First, here are the things you might try if your browser seems slow or unresponsive:
If you have an external modem, the power light should be lit to indicate that the modem is on.
If you have an external modem, make sure the modem cable is connected to your computer. Then try manually sending something to the modem. From a Linux shell you can do this:
% echo AT > /dev/modem
               
From a DOS prompt on a Windows machine you can do this:
% echo AT > COM1
               
If your modem is connected, you will see the send and read lights flash as the modem responds OK to the AT command. If the lights do not flash, either the modem is not connected, or you have configured it for the wrong COM port, PCMCIA slot, or other attachment point.
External modems should have a light labeled CD (Carrier Detect) to indicate whether there is a carrier signal; that is, whether you are online. If it is not lit, it may be that the remote end hung up on you, or you lost your connection through too much noise on the line or an inactivity timeout.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Questions for the Browser Side
First, here are the things you might try if your browser seems slow or unresponsive:
If you have an external modem, the power light should be lit to indicate that the modem is on.
If you have an external modem, make sure the modem cable is connected to your computer. Then try manually sending something to the modem. From a Linux shell you can do this:
% echo AT > /dev/modem
               
From a DOS prompt on a Windows machine you can do this:
% echo AT > COM1
               
If your modem is connected, you will see the send and read lights flash as the modem responds OK to the AT command. If the lights do not flash, either the modem is not connected, or you have configured it for the wrong COM port, PCMCIA slot, or other attachment point.
External modems should have a light labeled CD (Carrier Detect) to indicate whether there is a carrier signal; that is, whether you are online. If it is not lit, it may be that the remote end hung up on you, or you lost your connection through too much noise on the line or an inactivity timeout.
Look at external modem lights when you request a web page. The read and send lights should be flashing. This is also true for DSL modems, cable modems, hubs, and other network equipment. The send light will tell you that your modem is trying to send data out to the Internet. The read light will tell you if your modem is getting anything back from the network. If you cannot see these lights flashing, there is no data flowing through the modem.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Questions for the Server Side
Now let's look at things from the server side. Here's what you should look at if your web server seems sluggish.
If you are running a web site from a PC, be sure to disable the power conservation features that spin down the disk and go into sleep mode after a period of inactivity. Sleep mode will slow down the first user who hits your site while it is sleeping, because it takes a few moments for the disk to spin up again. Some operating systems — for example, Mac OS X — are capable of quickly serving pages in their sleep; but even they will eventually have to wake up to log to disk, so it is best to turn off sleep mode.
DNS servers can become overloaded like anything else on the Internet. Since DNS lookups block the calling process, a slow DNS server can have a big impact on perceived performance. Check whether your DNS server's CPU or network load is nearing its capacity by monitoring that machine's hardware statistics. See Chapter 4 for more information on monitoring.
If you determine that your DNS server is a problem, consider setting up additional servers or simply pointing your DNS resolver to another DNS server. Using a different DNS server is done by modifying /etc/resolv.conf under Linux or using the Network Control Panel on Windows.
Netscape browsers do not display a page at all until all images sizes are known. If you do not include the images sizes in your HTML, this means that the browser must actually download all the images before it knows the sizes, resulting in a long delay before the user sees anything at all. Many users also do not download images for one reason or another, but would like to know what kind of image it is they are missing, especially if you use images for navigation tools. So for best performance and usability, make sure all your images have size parameters in the HTML like this:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Key Recommendations
  • Turn off images on the client.
  • Turn off Java on the client.
  • Turn off cache validation on the client.
  • Put more RAM on the server.
  • Put more RAM on the client.
  • Buy a better connection to the Internet.
  • On a LAN, if you can cache static content in RAM, you can probably serve it at full network speed. If you can't cache content, then your disk is probably the bottleneck.
  • On the Internet, the Internet is usually the bottleneck; the next bottlenecks are dynamic content generation and database queries.
  • If you have other suggestions for quick checks, please write p@patrick.net.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Web Site Architecture
There are many trade-offs to make in designing a web site, which involve many possible components and configurations. What you need depends on what you're trying to do; one size does not fit all. This chapter goes over the fundamental problems that everyone runs into.
There are a number of trade-offs to make in designing a web site architecture: state versus scalability, replication versus simplicity, synchronous versus asynchronous, connectionful versus connectionless, speed of development versus planning, and procedural versus object-oriented programming.
Left to its own devices, a web site has no ability to remember individual users from one web transaction to the next. In such "stateless" web sites, users have no particular information that needs to be tracked. The web site has complete amnesia about your previous visits. It delivers the page without considering whether you've asked for it before or what other pages you've viewed.
Web sites that have no user state have no problem with scalability. Stateless web sites are easily replicated for scalability by load balancing across many servers, even if the content is dynamic (for example, a site that serves stock quotes or weather information), as long as the source of that dynamic data can be replicated. Since the web servers are all functionally the same, it does not matter if a user gets the home page from one server, then hits a different server when he clicks on a link on that home page.
This is different for a transactional site where users have state, such as being logged in or out, having items in a shopping cart, or having a balance. User state is the origin of most bottlenecks on transactional sites, limiting scalability by limiting how fast the state can be retrieved or updated, or forcing servers to constantly share state. The "system of record" is the database in which the transactions are legally recorded, and it is inevitably a bottleneck. There are some simple ways to cope with the conflict between state and scalability:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Trade-offs
There are a number of trade-offs to make in designing a web site architecture: state versus scalability, replication versus simplicity, synchronous versus asynchronous, connectionful versus connectionless, speed of development versus planning, and procedural versus object-oriented programming.
Left to its own devices, a web site has no ability to remember individual users from one web transaction to the next. In such "stateless" web sites, users have no particular information that needs to be tracked. The web site has complete amnesia about your previous visits. It delivers the page without considering whether you've asked for it before or what other pages you've viewed.
Web sites that have no user state have no problem with scalability. Stateless web sites are easily replicated for scalability by load balancing across many servers, even if the content is dynamic (for example, a site that serves stock quotes or weather information), as long as the source of that dynamic data can be replicated. Since the web servers are all functionally the same, it does not matter if a user gets the home page from one server, then hits a different server when he clicks on a link on that home page.
This is different for a transactional site where users have state, such as being logged in or out, having items in a shopping cart, or having a balance. User state is the origin of most bottlenecks on transactional sites, limiting scalability by limiting how fast the state can be retrieved or updated, or forcing servers to constantly share state. The "system of record" is the database in which the transactions are legally recorded, and it is inevitably a bottleneck. There are some simple ways to cope with the conflict between state and scalability:
  • Keep the state explicit and compact, perhaps in a single cookie so that the state of a transaction is in one convenient package.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Elements
In this section, we take a look at the basic elements that make up web site architecture.
The web browser is a nearly ideal Graphical User Interface (GUI) because it is standard, simple, and ubiquitous. Most of what you need to show users is available: text, graphics, buttons, fill-in boxes, etc. The design of a GUI with HTML is about as easy as it gets. Other GUIs, such as those created with Visual Basic or Java, require much more training to create, and usually interface with the back-end in a non-HTTP way, greatly complicating testing and requiring each user to download a GUI just for that one application. Pretty much every PC in the world now has a browser, and they all read HTML, so it's crazy not to take advantage of that.
It is a particularly bad idea to make any site "Optimized for Internet Explorer" or "Optimized for Netscape." The entire value of the Web lies in its ubiquity and portability. If you start imposing requirements on users that are not strictly necessary, you not only alienate those who do not use your recommended browser, but you also expose yourself to the danger of platform dependence. Once you are dependent on a particular browser, you have given up the freedom of your users as well as your own freedom to look at your content in any other way. Why give up your freedom?
With the emerging Document Object Model (DOM) standard, browsers can do essentially everything you can do with other GUIs, such as column sorting, downloading just data or fractions of HTML pages, and various widgets that would otherwise require a Java applet. DOM is supported by Internet Explorer (IE), Netscape, and Opera, though support for DOM varies in quality and standards compliance. IE and Netscape use different versions of the DOM, and within Netscape, the DOM is different between Versions 4 and 6.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Example Web Site Architectures
Now that you know some of the trade-offs and elements of a web site architecture, you need to decide how to put it all together. There are an infinite number of possible arrangements of web site machines and software, but they fall into a few broad categories, which are easily described. Serving of static content is extremely scalable simply by load balancing across more web servers, so that is not very interesting. Instead, I concentrate on transactional web sites or those composed largely of dynamic content.
A number of exact descriptions of architectures are available by reading the details of vendor performance tests. See the Web Bench tests from http://www.spec.org/ for examples. The following contain some common themes.
Most small web sites run entirely from one box, typically using Linux, Apache, MySQL, and Perl (from which we have the acronym LAMP). One box means there will be no network traffic between server-side components, and perhaps even the use of extremely fast-shared memory communication rather than loopback network communication. Though the single server will have to context-switch between the various processes, the single box may still have better performance than dedicated boxes connected by Ethernet. It's not clear whether a single box is more reliable, or less.
Larger web sites can also be run from a single box. In fact, Oracle in particular would love it if you ran an Oracle Web Server and executed Java servlets directly in the database. The downside of this approach is scalability: you're limited to the capacity of that one machine, unless you can successfully cluster this approach, which is difficult.
A similar approach is to have exactly two boxes: one for static content such as images, and one for dynamic content generated by servlets or CGI's. This has the advantage that the boxes can be independently optimized. The static content box should have enough memory to hold all the static content, while the dynamic content box should have multiple fast CPUs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Trends
Though bandwidth is getting steadily better, latencies are not. Packets move at pretty close to the speed of light right now, and that's not going to improve. This means that an exchange of a thousand small packets takes an amount of time proportional to distance. So geography still matters, and always will.
To compensate for the latencies inherent in the Internet, there is a trend toward pushing things out towards users in advance of their requests. Static content has been distributed to servers around the country and the world by companies like Akamai for some time now. A logical next step is to distribute the page-generation applications themselves.
Finally, I can foresee moving dynamic page generation to the browser itself. In fact, this has already happened to some degree, since the latest browsers can cache XSL and images and update and reformat XML data fragments within a single "page" by clever use of the emerging DOM standard for data structures within the browser and JavaScript to modify those structures. It has been possible for some time to request a fragment of a web document with the HTTP Byterange request, which most web servers support. The problem has been getting the browser to integrate that new fragment into a web page already cached in the browser. These things are already possible with applets, but would require a huge amount of custom work. With DOM, it is far easier. This means that a great deal of middleware code can simply go away. It is also conceivable that browsers might just interact directly with relational databases, issuing their own SQL calls to retrieve and update pages.
At this point, you might say, well, if everything else is in the browser, why not a database too? It's always been there. It's called the browser's cache. Caches are not relational, but the basics of storing and looking up data are there. The biggest problem is that the user has no explicit control over the cache. Users cannot directly query the cache, insert things, or update or delete them. If you happen to know how the cache works in your browser, you can modify it, but that's not at all the same as referring to it programmatically from a web page. Once we have that ability, page and application updates can be much smaller and faster.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Sample Configurations
Now let's examine some sample configurations of web site architectures for low-, medium-, and high-volume sites.
A low-volume site gets one to ten thousand hits per day. Such a site can easily be run out of your home. A typical configuration for this level is good PC hardware ($2,000) running Linux 2.2 (free), Apache 1.3 (free), with connectivity through a cable modem with 100kbps upstream ($100 per month).
For database functionality, you may use flat files, or read all of the elements into a Perl hash table or array in a CGI, and not see any performance problems for a moderate number of users if the database is smaller than, say, a few thousand items. Once you start getting more than one hit per second, or when the database gets bigger than a few thousand items or has multiple tables, you may want to move to the MySQL free relational database.
The database and connectivity are the weak links here. Apache and Linux, on the other hand, are capable of handling large sites.
A medium volume site gets ten thousand to one million hits per day. A typical configuration for a medium volume site is a Sun Ultra or an Intel Pentium Pro machine with 128 MB for the operating system and filesystem buffer overhead plus 2 to 4 MB per server process. Of course, more memory is better if you can afford it. Such workstation-class machines cost anywhere between $2,000 and $20,000.
You should have separate disks for serving content and for logging hits (and consider a separate disk for swap space), but the size of the content disk really depends on how much content you are serving. Striped disk arrays get better random access performance because multiple seeks can happen in parallel.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Key Recommendations
  • Be aware of the trade-offs you have to make.
  • For every architecture, ask yourself "If I decide I don't like this, can I migrate away from it easily after having implemented it?".
  • Plan for future scalability, not just for your immediate needs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Capacity Planning
Most processes can be divided into two classes: I/O-bound and CPU-bound. The serving of static HTML is usually I/O-bound. It is limited by the rate at which a file can be retrieved from disk (if not already in memory) and the speed at which the file can be moved out the network interface. Disk and network are I/O devices, far slower than CPU, so CPU power does not play a significant role.
Generation of dynamic HTML is just the opposite. It is usually CPU-bound, meaning that it takes longer to create the page than it does to move the page out the network interface. CPU is critical here, especially if you're using CGI's or Java servlets to create your dynamic pages. Most of that CPU processing is string manipulation. On the other hand, dynamic content depending that depends on database queries is usually limited by the speed of the database, which in turn is usually I/O-bound because it needs to retrieve data from disk. So how to plan for capacity depends entirely on how your site works.
When you evaluate a potential architecture, the most critical part of the job is to compare your required latency and bandwidth to the rated capacity of every link in your proposed configuration. Each component should meet those requirements with an additional margin for component interaction inefficiencies and increasing load over the life of the architecture. You could skip the calculations and forecasting, buy something that satisfies your immediate requirements, and forge ahead, planning to upgrade when necessary — but there are a few reasons why you're well advised to do a little math and think about where you want the system to go in the future.
First of all, management likes to have a good idea of what they're going to get for the money you're spending. If you spend money on a system that cannot deliver because you didn't do a few calculations, you then have the embarrassing task of explaining why you need to spend more. You may not even be able to use what you have already bought if it's not compatible with the higher-performance equipment you need.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Do the Math . . .
When you evaluate a potential architecture, the most critical part of the job is to compare your required latency and bandwidth to the rated capacity of every link in your proposed configuration. Each component should meet those requirements with an additional margin for component interaction inefficiencies and increasing load over the life of the architecture. You could skip the calculations and forecasting, buy something that satisfies your immediate requirements, and forge ahead, planning to upgrade when necessary — but there are a few reasons why you're well advised to do a little math and think about where you want the system to go in the future.
First of all, management likes to have a good idea of what they're going to get for the money you're spending. If you spend money on a system that cannot deliver because you didn't do a few calculations, you then have the embarrassing task of explaining why you need to spend more. You may not even be able to use what you have already bought if it's not compatible with the higher-performance equipment you need.
Second, unplanned growth has penalties associated with it—for example, unforeseen barriers to scalability, upgrades, or platform changes. You'll probably need more capacity next year than you do this year. If you cannot easily migrate your content and applications to higher-performance equipment, you will suffer.
Third, unplanned systems are more difficult to manage well because they are more difficult to comprehend. Management is inevitably a larger cost than the equipment itself, so whatever you can do to make management easier is worthwhile.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
. . . But Trust Your Eyes More than the Math
It is easy, however, to plan too much. Requirements change and new technologies are making older ones obsolete, so you can't know for sure what you'll need in a year or two. It is a good idea to choose a few pieces of flexible, scalable equipment of adequate rated capacity and try them out together, knowing you can add capacity or alter the architecture as you collect real-world data and as new alternatives become available. Choose components that "play nice" with products from other manufacturers, rather than proprietary components. Starting this way has the substantial advantage of giving you continuous feedback on the performance and reliability of live, interacting equipment.
Don't bet the farm on vendor specifications and advertising. They are less reliable sources of information than firsthand experience or the experience of trusted friends. It is shocking, but true, that some vendors have fudged benchmark and scalability tests in their quest for sales. A real system to build on also gives you a gut-level feel for the kind of performance you can expect. You can use this feel to check your analytical model against reality.
Remember that component ratings are the maximum the vendor can plausibly claim, not what you will get in practice. 10Mbps Ethernet will give you a maximum of about 8Mbps of data throughput in practice. Cause a few problems yourself, just to see what's going to happen next year. Better that your server crashes right in front of you, for known reasons, than when you're in bed at 4 a.m. for unknown reasons. Try the load generation tools mentioned in the load testing chapter, but be sure that the load and network match your production environment. Do some tests over 28.8 kbps modems if your customers will be using them.
Generating relevant and complete tests is tricky. For example, no one has tens of thousands of modems just for generating realistic load; you have to use modem-emulation features of load testing software for that. Watch to be sure that latency remains bounded to some reasonable value when you test at very high throughput. Also watch to see what happens when latency does go up. Many applications are sensitive to latency and simply give up if they have to wait too long for a response.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Questions to Ask
The first step in capacity planning is to clarify your requirements and get them down on paper. Here are some questions that will help you pin down what you need.
Unlike the client/server paradigm in which the most important sizing parameter is the number of concurrent users, the relevant parameter for web servers is HTTP operations per second, also referred to as hits per second. Few web sites receive more than 25 hits per second.
Web servers do not maintain a dedicated connection to the browser because HTTP 1.0 is a connectionless protocol. The user connects, requests a document, receives it, and disconnects. HTTP was implemented in this way to keep the protocol simple, to conserve bandwidth, and to allow a web page to consist of components from multiple servers. Even though the user has the impression that he or she has been connected during an entire session of reading pages from a web site, from the server's point of view, the user disappears after each request and reappears only when requesting a new page and associated content (such as images).
This loading characteristic of web servers is changing because HTTP 1.1 does allow the user to remain connected for more than one request. Although most hits are very short in duration, the use of HTTP 1.1 persistent connections can make the number of simultaneous connections relevant to your site. Also, Java applets sometimes open a connection back to the web server they came from and can then keep it open.
Because of the simple nature of HTTP, it is easy to make overly simplified assumptions about what "connections per second" actually means. For example, we usually assume that HTTP requests are fulfilled serially and that the connection time is very short. These assumptions are valid if we are serving relatively few users on a fast LAN connection, but not if we have many users on slow modem connections. In the case of many users with slow access, connections are likely to last more than a second. Each connection will require buffer space and processor time, so the server load calculations should measure the load in concurrent users, which is the typical form of client-server load.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Much Bandwidth Do You Need?
Server bandwidth is the single most important factor in the performance of your web site. The math to determine what bandwidth you need is, in essence, very simple:
hits/second * average size of a hit in bits = bits/second
That is, you need some estimate of the number of hits per second you want to be able to serve. Then you need to know the average size of one of these hits. From this, you know what sort of network bandwidth you need.
It has become clear that the number of packets is a more significant determinant of web performance than raw bandwidth once users are beyond ordinary dial-up modems. This is because each packet must be acknowledged, and the speed of light fixed, while bandwidth is increasing. It may take 20 milliseconds to send a 1500- byte packet to a PC on a DSL line, but only 12 milliseconds to get it from the network into the PC. It will take another 20 milliseconds for the acknowledgment to get back to the sender. So the 40 milliseconds latency is more than three times as important as bandwidth in this case, and it will only get more important later.
This is why it is so important to keep the number of individual items on a page to a minimum. Still, because most browsers are multithreaded, some latencies can happen in parallel. It turns out through experimentation that the best number of embedded images on a page is about the same as the number of threads the browser uses. For example, Netscape uses four threads, and you may get best performance by breaking a single large image into four smaller ones, in which acknowledgments can proceed in parallel rather than being strictly serial. But this holds only where the browser uses HTTP persistent connections ("keepalives") to avoid the overhead of setting up a TCP connection for each of the four smaller images.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Fast a Server Do You Need?
Given a certain network bandwidth, how fast a server do you need? More server disk speed, bus speed, and CPU speed all cost money. From the network bandwidth, you have an upper limit on the HTTP server hardware you need for serving static content such as HTML and images. A 250 MHz Pentium machine serving static files from Apache is capable of filling a 10 Mbps Ethernet line, because static pages are I/O-bound, not CPU-bound. On the other hand, sites that do a lot of servlets, CGI, or other dynamic content generation, are typically CPU-bound. If you have any dynamic content, you should size your server around that.
Another point to keep in mind is performance serving static files is not a bottleneck for most sites. Server performance is plenty fast enough with pretty much every web server, where "fast enough" means "faster than your outgoing line." Remember that there's no point in buying server hardware that has vastly more throughput capacity than the network it's connected to because you can't use that server's throughput. The web server software and operating system determine how efficiently you can use your server hardware.
The whole connection time for an Internet HTTP transfer is typically 1 to 10 seconds, most of which is usually caused by modem and Internet bandwidth and latency limitations. While this may be frustrating, it does leave quite a bit of breathing room for the server. It makes little sense to insure that a lightly loaded server can generate a response to an HTTP request in one millisecond if the network is going to consume thousands of milliseconds.
This is not to say that web server performance on the Internet is not important; without planning and tuning, a site that performs very well at low volume can degrade dramatically at high volume, overwhelming network considerations, especially if dynamically generated content is involved. But you can easily set up a server and get reasonable performance at light loads without any tuning, giving you some time to figure out what you want to do in the long term for the inevitable increase in load as the web expands.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How Much Memory Do You Need?
The answer is "more." It's a rule of thumb that you always need more memory. The worst thing short of total failure that can happen to your server is a memory shortage serious enough to start the swapping of entire processes out to disk. When that happens, performance will quickly drop, and users will wonder if it's worth their time to wait for your content. It is better to refuse the excess connections you cannot handle well than for all of your users to get unacceptable performance.
Servers that run as multiple processes, such as Apache, have a configurable limit to the number of processes and simultaneous connections per process. Multithreaded servers provide limits to the number of active threads. See Chapter 18 for details. You can also limit incoming connections by setting the TCP listen queue small enough so that you are assured of being able to service the users who have connected.
A server that is short of memory may show high CPU utilization because it constantly needs to scan for pages of memory to move out to disk. In such a case, adding CPU power won't help; you need to add more memory or reduce memory usage. Look at the rate of page scanning with vmstat under Solaris or with the Performance Monitor under NT. Under Solaris, the sr column of vmstat will tell you the scan rate. Sustained scanning for memory is an indication of a memory shortage. Under NT, the clue is that your processor time will be high and almost entirely "privileged" time. This means that the CPU is doing almost no work on behalf of the web server, but only on behalf of the OS itself.
There is a limit to the amount of memory any particular machine physically has room for. Be aware that this is a hard limit on scalability for that machine. When you hit that limit, you will have to replace the machine or offload some of the processing — for example, by running servlets on a middleware box rather than the web server itself.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Key Recommendations
  • Write down your requirements.
  • Remember that the network constrains the server output.
  • Size your server first for the back-end applications, because these are almost always heavier than simple web serving.
  • For every architecture, ask yourself "If I decide I don't like this, can I migrate away from it easily after having implemented it?"
  • Plan for future scalability, not just for your immediate needs.
  • Keep performance records so you know whether your site is meeting expectations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Performance Monitoring
The first thing you should do in tuning a web site is to monitor that site so you can see patterns and trends. From this you'll know whether you're helping or not. And as we will see later, the same programs we write for monitoring can also be used for load testing.
In this chapter, we first define some parameters of performance. Then we show how to monitor them with free software from http://patrick.net/, without installing anything on production machines.
There are four classic parameters describing the performance of any computer system: latency, throughput, utilization, and efficiency. Tuning a system for performance can be defined as minimizing latency and maximizing the other three parameters. Though the definition is straightforward, the task of tuning itself is not, because the parameters can be traded off against one another and will vary with the time of day, the sort of content served, and many other circumstances. In addition, some performance parameters are more important to an organization's goals than others.
Latency is the time between making a request and beginning to see a result. Some define latency as the time between making a request and the completion of the response, but this definition does not clearly distinguish the psychologically significant time spent waiting, not knowing whether a request has been accepted or understood. You will also see latency defined as the inverse of throughput, but this is not useful because latency would then give you the same information as throughput. Latency is measured in units of time, such as seconds.
Throughput is the number of items processed per unit time, such as bits transmitted per second, HTTP operations per day, or millions of instructions per second (MIPS). It is conventional to use the term "bandwidth" when referring to throughput in bits per second. Throughput is found by adding up the number of items and dividing by the sample interval. This calculation may produce correct but misleading results because it ignores variations in processing speed within the sample interval.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parameters of Performance
There are four classic parameters describing the performance of any computer system: latency, throughput, utilization, and efficiency. Tuning a system for performance can be defined as minimizing latency and maximizing the other three parameters. Though the definition is straightforward, the task of tuning itself is not, because the parameters can be traded off against one another and will vary with the time of day, the sort of content served, and many other circumstances. In addition, some performance parameters are more important to an organization's goals than others.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Latency and Throughput
Latency is the time between making a request and beginning to see a result. Some define latency as the time between making a request and the completion of the response, but this definition does not clearly distinguish the psychologically significant time spent waiting, not knowing whether a request has been accepted or understood. You will also see latency defined as the inverse of throughput, but this is not useful because latency would then give you the same information as throughput. Latency is measured in units of time, such as seconds.
Throughput is the number of items processed per unit time, such as bits transmitted per second, HTTP operations per day, or millions of instructions per second (MIPS). It is conventional to use the term "bandwidth" when referring to throughput in bits per second. Throughput is found by adding up the number of items and dividing by the sample interval. This calculation may produce correct but misleading results because it ignores variations in processing speed within the sample interval.
The following examples help clarify the difference between latency and throughput:
  • An overnight (24-hour) shipment of 1000 different CDs holding 500 megabytes each has terrific throughput but lousy latency. The throughput is (500 × 220 × 8 × 1000) bits/(24 × 60 × 60) seconds = about 49 million bits/second, which is better than a T3's 45 million bits/second. The difference is the overnight shipment bits are delayed for a day and then arrive all at once, but T3 bits begin to arrive immediately, so the T3 has much better latency, even though both methods have approximately the same throughput when considered over the interval of a day. We say that the overnight shipment is bursty traffic. This example was adapted from Computer Networks by Andrew S. Tanenbaum (Prentice Hall, 1996).
  • Trucks have great throughput because you can carry so much on them, but they are slow to start and stop. Motorcycles have low throughput because you can't carry much on them, but they start and stop more quickly and can weave through traffic so they have better latency.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Utilization
Utilization is simply the fraction of the capacity of a component that you are actually using. You might think that you want all your components at close to 100 percent utilization in order to get the most bang for your buck, but this is not necessarily how things work. Remember that for disk drives and Ethernet, latency suffers greatly at high utilization. A rule of thumb is many components can run at their best performance up to about 70 percent utilization. The perfmeter tool that comes with many versions of Unix is a good graphical way to monitor the utilization of your system.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Efficiency
Efficiency is usually defined as throughput divided by utilization. When comparing two components, if one has a higher throughput at the same level of utilization, it is regarded as more efficient. If both have the same throughput but one has a lower level of utilization, that one is regarded as more efficient. While useful as a basis for comparing components, this definition is otherwise irrelevant, because it is only a division of two other parameters of performance.
A more useful measure of efficiency is performance per unit cost. This is usually called cost efficiency. Performance tuning is the art of increasing cost efficiency: getting more bang for your buck. In fact, the Internet itself owes its popularity to the fact that it is much more cost-efficient than previously existing alternatives for transferring small amounts of information. Email is vastly more cost-efficient than a letter. Both send about the same amount of information, but email has near-zero latency and near-zero incremental cost; it doesn't cost you any more to send two emails rather than one.
Web sites providing product information have lower latency and are cheaper than printed brochures. As the throughput of the Internet increases faster than its cost, entire portions of the economy will be replaced with more cost-efficient alternatives, especially in the business-to-business market, which has little sentimentality for old ways. First, relatively static information such as business paperwork, magazines, books, CDs, and videos will be virtualized. Second, the Internet will become a real-time communications medium.
The cost efficiency of the Internet for real-time communications threatens not only the obvious target of telephone carriers, but also the automobile and airline industries. That is, telecommuting threatens physical commuting. Most of the workforce simply moves bits around, either with computers, on the phone, or in face-to-face conversations (which are, in essence, gigabit-per-second, low-latency video connections). It is only these face-to-face conversations that currently require workers to buy cars for the commute to work. Cars are breathtakingly inefficient, and telecommuting represents an opportunity to save money. Look at the number of cars on an urban highway during rush hour. It's a slow river of metal, fantastically expensive in terms of car purchase, gasoline, driver time, highway construction, insurance, and fatalities. Then consider that most of those cars spend most of the day sitting in a parking lot. Just think of the lost interest on that idle capital. And consider the cost of the parking lot itself, and the office.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Monitoring Web Performance Using Perl
We can easily expand on the earlier Perl example shown previously to create a useful monitoring system. This section shows how I set up an automated system to monitor web performance using Perl and gnuplot.
There are some commercial tools that can drive a browser, which are useful in cases, but they have many drawbacks. They usually require you to learn a proprietary scripting language. They are usually Windows-only programs, so they are hard to run from a command line. This also means you generally cannot run them through a firewall, or from a Unix cron job. They are hard to scale up to become load tests because they drive individual browsers, meaning you have to load the whole browser on a PC for each test client. Most do not display their results on the Web. Finally, they are very expensive. A Perl and gnuplot solution overcomes all these problems.
Perl was chosen over Java partly because of its superior string-handling abilities and partly because of the nifty LWP library, but mostly because there free SSL implementations for Perl exist. When I starting monitoring, there were no free SSL libraries in Java, though at least one free Java implementation is now available.
gnuplot , from http://www.gnuplot.org/ (no relation to the GNU project), was chosen for plotting because you can generate Portable Network Graphics (PNG) images from its command line. The availability of the http://www.gnuplot.org/ site has been poor recently, but I keep a copy of gnuplot for Linux on my web site http://patrick.net/software/. There is a mirror of the gnuplot web site at http://www.ucc.ie/gnuplot/.
At first I used Tom Boutell's GIF library linked to gnuplot to generate GIF images, but Tom has withdrawn the GIF library from public circulation, presumably because of an intellectual property dispute with Unisys, which has a patent on the compression scheme used in the GIF format. PNG format works just as well as GIF and has no such problems, though older browsers may not understand the PNG format. The
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Automatically Generating Monitoring Scripts Using Sprocket
Now that you know the basics of how to manually write a performance-monitoring script in Perl, I'm going to tell you that you don't really need to do that. I've modified and added to Randal Schwarz's Perl web proxy server so that it automatically generates monitoring scripts, albeit with some limitations. I call the modified proxy server sprocket. The code is Perl that generates Perl, so it may be hard to follow, but you can download it from http://patrick.net/software/sprocket/sprocket and use it even if you don't understand exactly how it works.
Here's how to use it:
  1. First, you'll need to have the same pieces listed above, all downloadable from http://patrick.net/software/.
  2. Once you have those things installed, get sprocket from http://patrick.net/software/sprocket/sprocket. It is very small and should only take a second or two to download. Put sprocket in a directory from which you can view the resulting PNG images. Your web server's public_html directory is a good choice.
  3. Now set your web browser's proxy to the machine sprocket uses by default, port 8008. In Netscape 4, choose Edit Preferences Advanced Proxies Manual Proxy Configuration View HTTP Proxy.
Once your proxy is set, start up sprocket with the -s option for scripting, redirecting the output to the script your want to create. For example:
% sprocket -s > myscript.pl
            
You'll see feedback on standard error as your script is being created. For example:
# scripting has started (^C when done)
# set your proxy to <URL:http://localhost:8008/>
# then surf to write a script
# scripted a request
# scripted a request
# scripted a request
In this case, we surfed two pages, resulting in three HTTP requests because one of the pages also contained an image. When you have surfed through the pages you want to monitor, enter Ctrl-C to exit
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using a Relational Database to Store and Retrieve Your Monitoring Data
The normal thing to do with your monitoring data is to keep it in a file, but you may want to keep it in a relational database instead. It takes a bit more work to set up, and relational data is not as easily accessible as data in files, but the advantages are huge:
  • First, you have all of your data in one place, so you don't have to go hunting for files when you need to find out what happened to the performance of a particular page when a new feature was introduced last month. Of course, having all the data in one place also makes you more vulnerable to losing it all at once.
  • You have ease of querying. Rather than manually poking or grepping through a huge file, you can simply make SQL queries for the time range in which you're interested.
  • SQL has built-in math functions for relatively easy comparisons and manipulation of the data.
  • If you can connect to the database over a network, you have access to the data remotely, which is not necessarily true for flat files.
If you're using Perl for monitoring, you should try the Perl DBI (Database Interface) for storing data. You'll need to download and install the Perl DBI package and a driver for your database. Here's some example Perl code to do the database insertion.
Instead of doing this, as in the previous script:
print FILE $date, " ", $end - $start, "\n";
You could do the following, assuming that you have a table defined called