In the four years since the first edition of this book, enormous fortunes were made and lost in a speculative bubble based on the potential of the Web. Thousands of web companies were founded with very thin business plans, and most are now floundering. Older companies, such as Cisco, Sun, and Oracle, rose to dizzying heights as the primary suppliers of equipment and software that was to revolutionize our lives, but they too have fallen greatly from their peaks. Microsoft continues its near-monopoly of the desktop, and yet it finds that monopoly increasingly irrelevant in a networked world. Meanwhile, mainstream media has had success in bringing much of television’s low quality and intrusive advertising to the Web.
Now that the revolution is over, what has really changed? The change is that the Web has moved from novelty to an essential utility for the distribution of information. URLs are everywhere, as well as understood everywhere. Phone lines now carry more data than voice traffic. Almost every company and government agency has a web presence, along with millions of individuals. The Web is now taken for granted, though it has huge beneficial effects on our lives. Thanks to the Web, it is cheaper, faster, and easier to communicate than ever before.
Yet web performance is a bigger problem than it was four years ago, because of the ever larger volume of information and the critical nature of modern transactions. Fortunately, we know much more about web performance, what works and what does not, how to watch for problems, and how to fix them. That is what this book is about.
This edition contains far more software that you can use to monitor, load test, and analyze web site performance than did the first edition. All of the software is available at http://patrick.net/software/. I’ve also included many more graphs and analyses of real performance problems that I have come across in the last few years.
This book is good for improving web site performance, estimating web site hardware and software requirements, and clarifying scaling issues. It covers client and network issues as well as server-side issues, because many web sites are on intranets, where a system administrator has control over the client and network, as well as the server. While most web performance discussion centers on the HTTP server, the server itself is not usually the performance bottleneck; rather, client connection speed, dynamic content generation, and database performance are. To improve performance, we must also look at these and other issues.
The performance I care about is from the end user’s point of view: how quickly the Web satisfies the user’s request. There are other kinds of performance, such as throughput, but this book focuses mainly on the user’s perception of speed. I have also included a chapter on reliability, which is yet another kind of performance.
Although this book presents some general principles of performance tuning, it concentrates on practical advice much more than on theory. I hope to make performance tuning simple by providing the algorithms to follow and tools to use that have helped me in real-life situations. Another goal is to present a clear picture of the chain of events involved in viewing a web page. Having a clear mental model of exactly what happens is critical to reasoning through new performance problems and finding solutions.
The best tool for improving the performance of your web site is a good understanding of your own application and your architecture. Software tools can help, but their value is proportional to your understanding of what they do. In the end, performance tuning is about spending money and time wisely to get the most out of your resources. Much of life is like that.
Web Performance Tuning will be of interest to anyone working on a web site, from a personal site running off a Linux PC at home to a large corporate site with multiple enterprise-class servers and redundant Internet connections. The book assumes you are familiar with the fundamentals of setting up a web site and getting connected to the Internet. If you need advice on setting up a web server, see Apache: The Definitive Guide by Ben Laurie and Peter Laurie (O’Reilly Media). If you need advice on how to get connected to the Internet, see Getting Connected by Kevin Dowd (O’Reilly Media).
This is a book of practical advice about the configuration and application-level programming of commodity components. In other words, the book covers what you can change right now, including design content, system administration, and application programming.
To some degree, you are at the mercy of the market to supply good building blocks. Since the performance of a web site is a function not only of tuning parameters and options, but also of the raw hardware and software products involved, this book includes information on how to select the appropriate products. I also cover the issues of scalability and conformance with open standards.
Here are some representative titles of people who might have an interest in this book:
Web applications programmer
Web content developer
This book assumes a basic familiarity with the technical components of the Web. Throughout the book I’ve included descriptions of the events that occur in a typical HTTP operation. There are also references in the text and the appendixes to other books and web sites for those who need more background or want to explore a subject in more depth.
The server examples are drawn from the Unix world because about 75 percent of all web servers use a Unix operating system or clone, and an even higher percentage of commercial web sites use Unix. Most of the other 25 percent are Windows-based web servers, so I tried to include more information about Windows in this edition. I’ve assumed that the reader has some programming experience with either C, Java, or Perl, but this is not a requirement for using this book.
The first part of this book covers topics of general interest to anyone running a web site, including quick and simple performance boosts, estimating what hardware and software you need for a given load and level of performance, common measures of web site performance, real case studies of some web sites, and principles of performance tuning.
The structure of the second part of this book is modeled on what actually happens when the user of a web browser requests an HTML page from a web server (see Figure P-1). We’ll follow an HTML request from client to network to server to middleware to database. From the browser’s point of view, after the request is sent, the answer magically appears on the network. From the network’s point of view, the answer magically appears at the connection to the server, and so on. We’ll trace the process back one stage at a time to point out performance issues along the way and to eliminate the unknown. I’ll also give tips for finding out which side of each interface is slower so that you can figure out where the bottleneck is and how to bring the performance of that section into line with the rest of your web site. Here is a breakdown by chapter.
- Chapter 1
Describes a set of questions useful to find common performance problems and quick tips to increase your site’s performance.
- Chapter 2
Helps you make decisions about what kind of hardware and software you’ll need to allow your site to perform well and scale for the future and describes major commercial web sites, including what hardware and software they use.
- Chapter 3
Describes how to estimate how much hardware you’ll need.
- Chapter 4
Gives software and examples for how to watch your site’s performance.
- Chapter 5
Helps you design and run relevant load tests of your web site.
- Chapter 6
Discusses how to figure out where the bottleneck is.
- Chapter 7
Provides many examples of problems than can crash your site.
- Chapter 8
Explains the performance you give up in exchange for security.
- Chapter 9
Gives some real examples of performance problems and solutions.
- Chapter 10
Describes some general principles to keep in mind when thinking about the performance of your web site.
- Chapter 11
Tells you what’s going on in your browser and how to help it along, especially when it seems to be hanging.
- Chapter 12
Gives tips on the differences between the various operating systems and how these affect browser performance.
- Chapter 13
Describes what the bottlenecks are on the client hardware and what you can do about them.
- Chapter 14
Describes the hardware of the Internet. There’s not a lot you can do about hardware that belongs to someone else, but you can at least choose the parts of the Internet you use. If you’re running your own intranet, you can modify many parameters to tune performance.
- Chapter 15
Describes the protocols at the core of the Web and gives you tips on how the protocols interact and how to get them to play nicely together.
- Chapter 16
Describes issues constraining the server, such as disk bottlenecks.
- Chapter 17
Gives tuning hints for the typical Unix web server.
- Chapter 18
Discusses the free and commercial HTTP server software available.
- Chapter 19
Goes over the various kinds of data you return to the user and the performance implications of each.
- Chapter 20
Gives you tips and tricks for reducing the amount of time spent generating dynamic content.
- Chapter 21
Goes over some issues in optimizing your Java applications.
- Chapter 22
Describes the performance and cost of some database systems.
- Appendix A
Offers my opinion on many performance products.
Is used for URLs, filenames, program names, commands, hostnames, and for emphasizing words.
Is used for HTTP headers, text to be typed literally, and function and system call names.
Constant width bold
Is used for user input.
We have tested and verified all the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:
|1005 Gravenstein Highway North|
|Sebastopol, CA 95472|
|(800) 998-9938 (in the United States or Canada)|
|(707) 829-0515 (international/local)|
|(707) 829-0104 (fax)|
There is a web page for this book, which lists errata and any additional information. You can access this page at:
To comment or ask technical questions about this book, send email to:
For more information about books, conferences, Resource Centers, and the O’Reilly Network, see the O’Reilly web site at:
Please write the author with comments, criticism, and suggestions at:
Be warned that web pages frequently change without regard to references to them. For the latest corrections and collection of my links, and also for the book’s code examples, see http://patrick.net/. You can also find the code examples at http://www.oreilly.com/catalog/webpt2/.
While this book is an excellent place to start learning how to improve your web performance, by no means is it the last word. So here I’ve listed some books, URLs, and newsgroups that you should examine if this book doesn’t answer all your questions, or if you simply want to know more.
In reading this book, you’ll find that I frequently refer to other books that explain concepts more completely than I can (at least, not without making this book twice its size). Here are books with good explanations of the details.
Albitz, Paul and Cricket Liu, DNS and Bind (O’Reilly Media, 1997).
Back, Maurice, Design of the Unix Operating System (Prentice Hall, 1986).
Ballew, Scott, Managing IP Networks with Cisco Routers (O’Reilly Media, 1997).
Blake, Russ, Optimizing Windows NT (Microsoft Press, out of print).
Brooks, Fredrick P., Jr., The Mythical Man-Month (Addison Wesley, 1995).
Chapman, Brent and Elizabeth Zwicky, Building Internet Firewalls, Second Edition (O’Reilly Media, 2001).
Cockcroft, Adrian and Richard Pettit, Sun Performance and Tuning, Second Edition (Prentice Hall, 1998). Everything about tuning Solaris and Sun hardware.
Cockcroft, Adrian and Will Walker, Capacity Planning for Internet Services (Prentice Hall, 2001).
Dowd, Kevin, Getting Connected (O’Reilly Media, 1996).
Frisch, Æleen, Essential System Administration (O’Reilly Media, 1995).
Gancarz, Mike, The Unix Philosophy (Digital Press, 1996). Wonderful explanation of what makes Unix Unix.
Garfinkel, Simson, PGP: Pretty Good Privacy (O’Reilly Media, 1995).
Gray, Jim, The Benchmark Handbook for Database and Transaction Processing Systems (Morgan Kauffman Publishers, 1993).
Guelich Scott, Shishir Gundavaram, and Gunther Birznieks, CGI Programming with Perl (O’Reilly Media, 2000).
Gurry, Mark and Peter Corrigan, Oracle Performance Tuning, Second Edition (O’Reilly Media, 1996).
Harold, Elliotte Rusty, Java Network Programming, Second Edition (O’Reilly Media, 2000).
Laurie, Ben and Peter Laurie, Apache: The Definitive Guide, Second Edition (O’Reilly Media, 1999).
Musumeci, Gian-Paolo D. and Mike Loukides, System Performance Tuning, Second Edition (O’Reilly Media, 2002). The standard text on Unix system performance.
Nassar, Daniel J., Ethernet and Token Ring Optimization (M&T Books, out of print). The accumulated experience of a network tuner. Includes TCP/IP tips.
Orfali, Robert and Dan Harkey, Client Server Programming with Java and CORBA (John Wiley & Sons, 1998).
Partridge, Craig, Gigabit Networking (Addison Wesley, 1994).
Stern, Hal, Mike Eisler, and Ricardo Labiaga, Managing NFS and NIS, Second Edition (O’Reilly Media, 2001).
Stern, Hal, and Evan Markus, Blueprints for High Availability (John Wiley & Sons, 2000).
Stevens, Richard, Advanced Programming in the Unix Environment (Addison Wesley, 1993)
Stevens, Richard, TCP/IP Illustrated, Volumes 1, 2, and 3 (Addison Wesley, 1994).
Stevens, Richard, Unix Network Programming (Prentice Hall, 1998).
Tannenbaum, Andrew S., Computer Networks (Prentice Hall, 1996). The canonical networking book.
Tannenbaum, Andrew S., Modern Operating Systems (Prentice Hall, 1992).
Ware, Scott, Michael Tracy, Louis Slothouber, and Robert Barker, Professional Web Site Optimization (Wrox Press, Inc., 1997).
Wall, Larry, Tom Christiansen, and John Orwant, Programming Perl, Third Edition (O’Reilly Media, 2000).
Wong, Brian L., Configuration and Capacity Planning for Solaris Servers (Prentice Hall, 1997). See especially Chapter 4, which is about configuring web services.
Wong, Clinton, Web Client Programming with Perl (O’Reilly Media, 1997).
The following URLs also include indispensable performance information:
A Netscape-tuning page.
The Apache home page. See especially http://www.apache.org/docs/misc/perf.html.
Tips for running Apache.
The Computer Measurement Group’s home page.
Jonathan Hardwick’s Java optimization page.
Very popular page packed with information on optimizing PCs.
Tom’s Hardware Guide. Rightly famous for PC hardware information.
Excellent review of performance measurement of the Internet.
Includes some papers on application performance tuning.
The definitive site for Winsock tuning.
Has the RFCs on which the web is based.
Reducing the Disk IO of Web Proxy Server Caches.
Open System Testing Architecture “The completely open way to test your systems.”
I hate to yell in all caps, but here it is:
THE INFORMATION IS PROVIDED “AS-IS” AND WITHOUT WARRANTY OF ANY KIND, EXPRESS, IMPLIED OR OTHERWISE, INCLUDING WITHOUT LIMITATION, ANY WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
IN NO EVENT SHALL THE AUTHOR, CONTRIBUTORS, OR THEIR EMPLOYERS BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT, OR CONSEQUENTIAL DAMAGES OF ANY KIND, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA, OR PROFITS, WHETHER OR NOT ADVISED OF THE POSSIBILITY OF DAMAGE, AND ON ANY THEORY OF LIABILITY, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS INFORMATION.
Not a single suggestion in this book is guaranteed to help any particular situation. In fact, if you simply change configurations and parameters without analyzing the situation and understanding what you are changing and why, you may experience hardware damage, data loss, hair loss, dizziness, and nausea. Back up everything, don’t work directly on production servers, and be careful.
Also note that the opinions expressed in this book are those of the author and have nothing to do with the author’s employer or with the book’s publisher, O’Reilly Media, Inc.
Thank you again to Linda Mui, my editor at O’Reilly, for her patience. Thanks to my father, Thomas, for instilling ambition, and to my mother, Diane, for saying I ought to write a book. My wife Leah and son Jacob and daughter Genevieve deserve enormous credit for giving me enough time to finish. I told Robert Hellwig I’d mention him here. And thanks to everyone on the Internet who is willing to share what they know just because it’s a nice thing to do. Thanks to Dean Gaudet and Jens-S. Voeckler for letting me include their material as appendices in the first edition. Much of the information from those appendices has been integrated into this second edition.
Second edition thanks to Sam Brodkin for a Java script tip and Daniel Lewart for suggestions and errata. Tony Pugliese provided many interesting leads and papers. I learned as much about performance working with John Nevins and Tori Walsh as I did before ever meeting them. Brian Robinson of Harvard University provided an excellent case study. Thanks to Adrian Cockcroft, Dave Loughlin, John Mani, and Joey Trevino for their useful comments. Thanks to Ron Walters for showing me the Perl Telnet module and to Naf Furman for introducing me to the Perl DBI. Pavel Semfield also provided many links to the new information.