The Custom Pasta Corporation routinely gets most of its orders via its web site on Tuesday evenings. Unfortunately, the performance of its site also slows to a crawl at that same time every week. The webmaster is asked to check out the server configuration and fix the problem.
She figures, first of all, that the server is being overloaded somehow, because many clients have reported this problem, and there is no obvious reason for a surge in overall Internet traffic on Tuesday evenings. On Tuesday evening, she tries some tests. First, she connects from a client machine she knows to have good performance and places a test order for pasta. It does indeed take a very long time to get back to her with an acknowledgment page. So even a good client on a LAN connection to the web server has problems.
Perhaps the problem is that the LAN is overloaded. The webmaster logs in to the web server itself, a Sun Ultra, and runs the snoop utility. She stops the output and looks through it. There are a few HTTP requests and replies going back and forth, but they don’t seem excessive, and there is not a particularly high number of TCP retransmits. There is no NFS traffic because all of the content is on this machine’s local disk. The netstat -i command shows very few collisions, and netstat without any options shows that 20 or so concurrent connections are in the ESTABLISHED state. The network doesn’t seem overloaded; rather, it seems that clients are waiting on the server.
Now the webmaster notices that she can hear the disk running continuously. This is odd, since she set up a separate disk for the log file and she knows that the HTML content is pretty lightweight: just a catalog of pasta types, simple HTML forms for ordering, and a few small graphics. The disk should easily be able to serve out this content for 20 users. She runs top and sees that pageout is taking 50% of the CPU time and that there are 20 or so instances of the pasta ordering CGI running along with numerous other processes spawned from the CGIs. The CGI is a rather large shell script. “Spaghetti code,” she thinks, while trying to read it. Worse, shell scripts are very inefficient as CGIs because of the number of additional processes they usually start.
Using top, she sees that she’s using more than 100% of physical memory. At least one problem is now clear: the server just doesn’t have enough memory to run so many concurrent CGIs. The server is paging processes out to disk, that is, using virtual memory to try to give all of the processes a chance to run. But, since virtual memory is on disk, it is several thousand times slower than physical memory, and performance has degraded dramatically. The vmstat command confirms that paging is happening continuously: vmstat’s pi and po fields, for page in and page out rates, are both over 10 per second.
The webmaster exits the X Window System and checks the situation from terminal mode. It is slightly better because more memory is available, but there is still a great deal of paging going on. The CGIs were written by a contractor who has long since gone, so rather than try to decipher the code and rewrite it in a more efficient language, the webmaster simply buys more RAM to tide her over until she can plan how to get away from ordinary CGIs, perhaps by using NSAPI, FastCGI, or servlets. In any case, the server performance improves dramatically when the new memory is installed, and the webmaster has time to plan for the future.
Telecommuter Tom lives in San Francisco, but he works as a marketer for Dev Null Drivers, Inc., in San Jose. It’s an arduous commute, but Tom likes his job and he likes living in San Francisco. He suggests to his boss that he be allowed to telecommute a few days a week. Tom’s boss likes the idea, because it frees up some expensive office space a few days a week, and because she wants to keep Tom happy.
Tom is responsible for a lot of content on the company’s web site, and he is a fan of a Java chat applet that is very popular within the company. Most Dev Null employees leave it running so they can have private conversations in a crowded office. Tom also creates or edits many files every day. He goes out and buys a 56K modem, and signs up for service with Oversold ISP, which charges only $10 per month for unlimited access and a static IP address. When he logs in for the first time, his modem control program tells him that he’s actually getting nearly 56kbps between his modem and the modem at his ISP in San Francisco, but the interactive response time between his home and work is awful. Tom has both a Mac and a PC, but the situation is the same with both of them. It’s no better early in the morning or late at night. When he starts a telnet session to edit files at work, he finds he can type ahead for several words before they are echoed back to him. The chat applet is just as bad. This is very disappointing, because he is used to great response time from the LAN at work. The boss asks how it’s going, and Tom has to tell her that he’s actually having a hard time being productive because of the poor performance of the ISP.
Tom’s boss describes the problem to Dev Null’s system administrator and asks for advice. The sysadmin asks Tom for his IP address and runs a traceroute to it from Dev Null in San Jose. The traceroute shows huge latencies, on the order of five seconds, to intermediate routers with domain names ending in .na and one with the word satellite in the name. Not having seen the .na domain before, the sysadmin looks up the country assigned the .na ending. It’s Namibia. Having no idea where Namibia is, he looks on a web search engine for some mention of the country. Ah, it’s in southwest Africa. It seems that Oversold is routing all of Tom’s traffic through Namibia via satellite because Oversold’s only connection to the Internet is in Windhoek, its home town. The sysadmin silently thanks whatever Unix god wrote traceroute and suggests that Tom use the same ISP that Dev Null is using because Tom’s traffic would then be on that ISP’s private network between San Francisco and San Jose and would not have to traverse any other part of the Internet. Here’s the traceroute output from devnull to Tom’s computer in San Francisco:
% traceroute 184.108.40.206 traceroute to 220.127.116.11 (18.104.22.168), 30 hops max, 40 byte packets 1 router.devnull.com (22.214.171.124) 22.557 ms 24.554 ms 10.07 ms 2 sj103.mediatown.com (126.96.36.199) 37.033 ms 16.912 ms 79.436 ms 3 sf000.mediatown.com (188.8.131.52) 29.382 ms 66.754 ms 14.688 ms 4 bordercore2-hssi0-0.SanFrancisco.mci.net (184.108.40.206) 134.24 ms 38.762 ms 18.445 ms 5 core4.SanFrancisco.mci.net (220.127.116.11) 165.704 ms 210.167 ms 125.343 ms 6 sl-stk-1-H9-0-T3.sprintlink.net (18.104.22.168) 30.076 ms 33.985 ms 23.287 ms 7 gip-stock-1-fddi1-0.gip.net (22.214.171.124) 48.501 ms 30.192 ms 19.385 ms 8 gip-penn-stock.gip.net (126.96.36.199) 501.154 ms 244.529 ms 382.76 ms 9 188.8.131.52 (184.108.40.206) 503.631 ms 488.673 ms 498.388 ms 10 220.127.116.11 (18.104.22.168) 505.937 ms 680.696 ms 491.25 ms 11 22.214.171.124 (126.96.36.199) 1046.61 ms 1057.79 ms 1168.45 ms 12 oversold.com.na (188.8.131.52) 1074.49 ms 1086.45 ms 1257.85 ms 13 satellite.oversold.com.na (184.108.40.206) 1174.49 ms 1186.45 ms 1157.85 ms 14 usroutersf.oversold.com (220.127.116.11) 4074.49 ms 5086.45 ms 4257.85 ms 15 18.104.22.168 (22.214.171.124) 5293.84 ms 5230.90 ms 5148.39 ms
Tom switches accounts to Dev Null’s ISP and finds that he can now get nearly 56kbps in download times from work, and that interactive response is nearly indistinguishable from that on the LAN at work. He has to pay an extra $10 per month, but it is money well spent. Here’s the traceroute output from work to his new account:
devnull> traceroute tom.mediatown.com traceroute to tom.mediatown.com (126.96.36.199), 30 hops max, 40 byte packets 1 router.devnull.com (188.8.131.52) 22.557 ms 24.554 ms 10.07 ms 2 sj103.mediatown.com (184.108.40.206) 37.033 ms 16.912 ms 79.436 ms 3 sf000.mediatown.com (220.127.116.11) 29.382 ms 66.754 ms 14.688 ms 4 tom.mediatown.com (18.104.22.168) 29.382 ms 66.754 ms 14.688 ms
Note that I have nothing against Namibia. I did a little research and found they happen to have an ISP at http://www.iwwn.com.na/, which has enough throughput that I can surf it quite comfortably in California. Latencies are just over one second, which isn’t bad given the distance to the other side of the world. Running traceroute www.iwwn.com.na shows it really is over there and not just mirrored somewhere in the U.S.
The webmaster of the Antique Fruitcake website is experiencing a frustrating performance problem and having a hard time tracking it down. The website has a catalog of almost 10,000 fruitcakes searchable by year, model, serial number, and current owner in a database connected to a web server. This is not a large database by most standards, but customers are complaining that it can take 10 seconds to return the results of a simple query, while complex searches can take even longer.
The machines are overconfigured for their tasks, so the performance problem is that much more of a mystery. The web server machine is a dual-CPU Pentium 166 MHz Compaq with 256MB of RAM, a 4GB SCSI II disk used for both content and logging, and a 100Mbps Ethernet card. The database machine is a single-CPU Pentium 200Mhz Compaq with 164MB of RAM running SQL Server, a 5GB RAID 5 disk set, and again a 100Mbps Ethernet card. The two machines are not used for any other applications. The database machine is behind a firewall and is talking to the web server outside the firewall using just TCP/IP.
The configuration mostly uses the out of the box defaults, but a few optimizations have been performed. On the web server, all unnecessary services have been disabled, the memory available for Active Server Page (ASP) caching has been increased, and database connection pooling has been turned on so that the overhead of creating new ODBC connections to SQL Server is minimized. Pooling means connections are recycled rather than created new for each query. On the database machine, the data has been indexed by each of the search variables to reduce search time; tempdb, a temporary storage area in the database that gets heavy use, has been put in RAM rather than left on disk; and the cache size for SQL Server has been increased.
The NT performance monitor tool shows that the bandwidth between the database and the firewall and between the firewall and the web server is not highly utilized, and that memory utilization on both machines is also low. Simple tests from a web browser show that static pages are being delivered quite quickly, but pages that require database access are slow. When a query is submitted to the web server, its CPU utilization jumps for an instant as it interprets the query and passes it through the firewall to the database machine, which then shows a CPU jump for about a second. The ISQL utility on the database machine also shows that even complex queries are actually executing in less than two seconds. The mysterious part is that both machines are then mostly idle for about seven seconds before the data starts coming back to the browser, and this happens even for identical queries, which should be cached by the database and therefore returned extremely quickly on the second call.
Simply by process of elimination, the webmaster decides the problem must be with the firewall. A close examination of network traffic shows that the delay is indeed happening almost entirely within the firewall machine. The firewall is a 486 machine with an excessive default set of rules that includes blocking many kinds of application traffic in both directions, forcing the firewall to look far within each packet rather than just at the headers. An upgrade to a packet filtering router with a few simple rules for blocking inbound traffic from everywhere except the web server solves the performance problem. Unfortunately, this also makes the database less secure, because a break-in to the web server would mean that the database is exposed. Some security has been traded for better performance.
The doctors at Budget Surgeon would like to use the Web to research surgical journals. They subscribe to a commercial service offering access to a variety of surgical journals via the Web, but they are immediately disappointed by the performance when they try it out. The doctors see that Netscape indicates that it is downloading the articles at 2KB/s and assume that they need to upgrade their 28.8 modems. They ask the office computer expert for his recommendation, and he suggests a new cable modem service being offered in their area. The cable modem ads claim 500kbps downstream, which is 62.5KB/s (500kbps/8 bits per byte).
The doctors subscribe to the cable modem service but are astonished to try it out and find that they are still getting only about 4KB/s, rather than the 62.5KB/s advertised, and it still takes forever before any useful information shows up in their browser. They go back to the computer expert and complain that the cable modem is a fraud. The expert decides to try it for himself. He has an old Pentium 75Mhz laptop with a 512KB L2 cache and 24MB of RAM running Linux and Netscape. He hooks up the cable modem and views a few well-known sites. He sees that he is, in fact, getting better than 50KB/s viewing the surgical journals over the cable modem. The expert tries out one of their Macintosh PowerBook 5300cs laptops and finds that it does indeed get only 4KB/s. A little research reveals that this series of PowerBooks has no L2 cache at all, meaning that no executable code is cached; rather, every instruction must be fetched from RAM, which is about ten times slower than L2 cache. This accounts for some of the problem. The doctors seem to have enough RAM overall, and enough of it assigned to Netscape, because the disk is not particularly busy during downloads. The Macintosh has no place on its motherboard to put cache, so there’s not much the doctors can do about that problem if they want to keep the same machines.
The expert also notices that the Macs are faster when not plugged into an outlet but running off the battery. This is really odd, because plugged-in laptops should have better performance because there is no special need to conserve power. Most laptop BIOS’s or OS’s are configured to turn off power conservation features (which slow down the machine) when wall power is available. It turns out that some of the first 5300s shipped ran slower when plugged in because of electromagnetic interference (EMI) within the machine. An EMI shroud is ordered from Apple to solve this problem.
The Macs are running MacOS 7. An upgrade to MacOS 8 provides another immediately noticeable improvement, partly because more of the MacOS 8 code is native to the PowerPC CPU than to the previous CPU, the Motorola 68K series. The 68K code requires a 68K emulator to run on the PowerPC, and this slows down the machine.
Finally, the expert tries Speed Doubler by Connectix. Speed Doubler improves performance in several ways, but the most important feature is a higher-performance replacement for the 68K emulator in the PowerMac. Not only do 68K applications run faster, but because parts of the MacOS 8 are still emulated, even native PowerMac applications see some benefit. In the end, the doctors get better than 25KB/s and are satisfied with the performance.