Latency and Throughput

Latency is the time between making a request and beginning to see a result. Some define latency as the time between making a request and the completion of the response, but this definition does not clearly distinguish the psychologically significant time spent waiting, not knowing whether a request has been accepted or understood. You will also see latency defined as the inverse of throughput, but this is not useful because latency would then give you the same information as throughput. Latency is measured in units of time, such as seconds.

Throughput is the number of items processed per unit time, such as bits transmitted per second, HTTP operations per day, or millions of instructions per second (MIPS). It is conventional to use the term “bandwidth” when referring to throughput in bits per second. Throughput is found by adding up the number of items and dividing by the sample interval. This calculation may produce correct but misleading results because it ignores variations in processing speed within the sample interval.

The following examples help clarify the difference between latency and throughput:

  • An overnight (24-hour) shipment of 1000 different CDs holding 500 megabytes each has terrific throughput but lousy latency. The throughput is (500 × 220 × 8 × 1000) bits/(24 × 60 × 60) seconds = about 49 million bits/second, which is better than a T3’s 45 million bits/second. The difference is the overnight shipment bits are delayed for a day and then arrive all at once, but T3 bits begin to arrive immediately, so the T3 has much better latency, even though both methods have approximately the same throughput when considered over the interval of a day. We say that the overnight shipment is bursty traffic. This example was adapted from Computer Networks by Andrew S. Tanenbaum (Prentice Hall, 1996).

  • Trucks have great throughput because you can carry so much on them, but they are slow to start and stop. Motorcycles have low throughput because you can’t carry much on them, but they start and stop more quickly and can weave through traffic so they have better latency.

  • Supermarkets would like to achieve maximum throughput per checkout clerk because they can then get by with fewer clerks. One way for them to do this is to increase your latency — that is, to make you wait in line, at least up to the limit of your tolerance. In his book Configuration and Capacity Planning for Solaris Servers (Prentice Hall), Brian Wong phrased this dilemma well by saying that throughput is a measure of organizational productivity while latency is a measure of individual productivity. The supermarket may not want to waste your individual time, but it is even more interested in maximizing its own organizational productivity.

  • One woman has a throughput of one baby per nine months, barring twins, triplets, etc. Nine women may be able to bear nine babies in nine months, giving the group a throughput of one baby per month, even though the latency cannot be decreased (i.e., even nine women cannot produce one baby in one month). This mildly offensive but unforgettable example is from The Mythical Man-Month by Frederick P. Brooks (Addison Wesley).

Although high throughput systems often have low latency, there is no causal link. You’ve just seen how an overnight shipment can have high throughput with high latency. Large disks tend to have better throughput but worse latency; the disk is physically bigger, so the arm has to seek longer to get to any particular place. The latency of packet network connections also tends to increase with throughput. As you approach your maximum throughput, there are simply more or larger packets to put on the wire, so a packet will have to wait longer for an opening, increasing latency. This is especially true for Ethernet, which allows packets to collide and simply retransmits them if there is a collision, hoping that it retransmitted them into an open slot. It seems obvious that increasing throughput capacity will decrease latency for packet switched networks. However, while latency due to traffic congestion can be reduced, increasing bandwidth will not help in cases in which the latency is imposed by routers or sheer physical distance.

Finally, you can also have low throughput with low latency: a 14.4 kbps modem may get the first of your bits back to you reasonably quickly, but its relatively low throughput means it will still take a tediously long time to get a large graphic to you. With respect to the Internet, the point to remember is that latency can be more significant than throughput. For small HTML files, say under 2K, more of a 28.8 kbps modem user’s time is spent between the request and the beginning of a response than waiting for the file to complete its arrival.

A graph of latency versus load is very different from a graph of throughput versus load. Latency will go up exponentially, making a characteristic “backwards L"- shaped graph. Throughput will go up linearly at first, then level out to become nearly flat. Simply by looking at a graph of load test results, you can immediately have a good idea whether it is a latency or throughput graph.

Network Latency

Each step on the network from client to server and back contributes to the latency of an HTTP operation. It is difficult to figure out where in the network most of the latency originates, but there are two commonly available Unix tools that can help. (Note that we’re considering network latency here, not application latency, which is the time the applications running on the server itself take to begin to put a result back out on the network.)

If your web server is accessed over the Internet, then much of your latency is probably due to the store-and-forward nature of routers. Each router must accept an incoming packet into a buffer, look at the header information, and make a decision about where to send the packet next. Even once the decision is made, the router will often have to wait for an open slot to send the packet. The latency of your packets will therefore depend strongly on the number of router hops between the web server and the user. Routers themselves will have connections to each other that vary in latency and throughput.

The odd but essential characteristic about the Internet is the path between two end-points can change automatically to accommodate network trouble, so your latency may vary from packet to packet. Packets can even arrive out of order. You can see the current path your packets are taking and the time between router hops by using the traceroute utility that comes with most versions of Unix. (See the traceroute manpage for more information.) A number of kind souls have made traceroute available from their web servers back to the requesting IP address, so you can look at path and performance to you from another point on the Internet, rather than from you to that point. One page of links to traceroute servers is at Also see for continuous measurements of ISP latency as measured from one point on the Internet.

Note that by default traceroute does a reverse DNS lookup on all intermediate IPs so you can see their names, but this delays the display of results. You can skip the DNS lookup with the -n option and you can do fewer measurements per router (the default is three) with the -q option. Here’s an example of traceroute usage:

% traceroute -q 2
traceroute to (, 30 hops max, 40 byte packets
1 ( 22.779 ms 139.675 ms
2 ( 18.714 ms 145.161 ms
3 ( 23.789 ms 141.473 ms
4 ( 29.091 ms 39.856 ms
5 ( 63.16 ms 62.75 ms
6 ( 82.212 ms 76.774 ms
7 ( 80.474 ms 76.875 ms
8 ( 81.611 ms *

If you are not concerned with intermediate times and want only to know the current time it takes to get a packet from your machine to another machine on the Internet (or on an intranet) and back to you, you can use the Unix ping utility. ping sends Internet Control Message Protocol (ICMP) packets to the named host and returns the latency between you and the named host as milliseconds. A latency of 25 milliseconds is pretty good, while 250 milliseconds is not good. See the ping manpage for more information. Here’s an example of ping usage:

% ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=248 time=112.2 ms
64 bytes from icmp_seq=1 ttl=248 time=83.9 ms
64 bytes from icmp_seq=2 ttl=248 time=82.2 ms
64 bytes from icmp_seq=3 ttl=248 time=80.6 ms
64 bytes from icmp_seq=4 ttl=248 time=87.2 ms
64 bytes from icmp_seq=5 ttl=248 time=81.0 ms

--- ping statistics ---

6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 80.6/87.8/112.2 ms

Measuring Network Latency and Throughput

When ping measures the latency between you and some remote machine, it sends ICMP messages, which routers handle differently than the TCP segments used to carry HTTP. ICMP packets get lower priority. Routers are sometimes configured to ignore ICMP packets entirely. Furthermore, by default, ping sends only a very small amount of information, 56 data bytes, although some versions of ping let you send packets of arbitrary size. For these reasons, ping is not necessarily accurate in measuring HTTP latency to the remote machine, but it is a good first approximation. Using telnet and the Unix talk program will give you a manual feel for the latency of a connection.

The simplest ways to measure web latency and throughput are to clear your browser’s cache and time how long it takes to get a particular page from your server, have a friend get a page from your server from another point on the Internet, or log in to a remote machine and run: time lynx -source>/dev/null. This method is sometimes referred to as the stopwatch method of web performance monitoring.

Using FTP

Another way to get an idea of network throughput is to use FTP to transfer files to and from a remote system. FTP is like HTTP in that it is carried over TCP. There are some hazards to this approach, but if you are careful, your results should reflect your network conditions.

First, do not put too much stock in the numbers the FTP program reports to you. While the first significant digit or two will probably be correct, the FTP program internally makes some approximations, so the number reported is only approximately accurate.

More importantly, what you do with FTP will determine exactly which part of the system is the bottleneck. To put it another way, what you do with FTP will determine what you’re measuring. To insure that you are measuring the throughput of the network and not of the disk of the local or remote system, you want to eliminate any requirements for disk access that could be caused by the FTP transfer. For this reason, you should not FTP a collection of small files in your test; each file creation requires a disk access.

Similarly, you need to limit the size of the file you transfer because a huge file will not fit in the filesystem cache of either the transmitting or receiving machine, again resulting in disk access. To make sure the file is in the cache of the transmitting machine when you start the FTP, you should do the FTP at least twice, throwing away the results from the first iteration. Also, do not write the file on the disk of the receiving machine. You can do this with some versions of FTP by directing the result to /dev/null. Altogether, we have something like this:

ftp> get bigfile /dev/null

Try using the FTP hash command to get an interactive feel for latency and throughput. The hash command prints hash marks (#) after the transfer of a block of data. The size of the block represented by the hash mark varies with the FTP implementation, but FTP will tell you the size when you turn on hashing:

ftp> hash
Hash mark printing on (1024 bytes/hash mark).
ftp> get ers.27may
200 PORT command successful.
150 Opening BINARY mode data connection for ers.27may (362805 bytes).
226 Transfer complete.
362805 bytes received in 15 secs (24 Kbytes/sec)
ftp> bye
221 Goodbye.

You can use Perl or the Expect scripting language to automatically run an FTP test at regular intervals. Other scripting languages have a difficult time controlling the terminal of a spawned process; if you start FTP from within a shell script, for example, execution of the script halts until FTP returns, so you cannot continue the FTP session. Expect is designed to deal with this exact problem. Expect is well documented in Exploring Expect, by Don Libes (O’Reilly & Associates). The autoexpect program can be used to automatically record your test.

Other performance measures

You can of course also retrieve content via HTTP from your server to test network performance, but this does not clearly distinguish network performance from server performance.

Here are a few more network testing tools:


ttcp is an old C program, circa 1985, for testing TCP connection speed. It makes a connection on port 2000 and transfers zeroed buffers or data copied from STDIN. It is available from and distributed with some Unix systems. Try which ttcp and man ttcp on your system to see if the binary and documentation are already there.


A more recent tool, circa 1992, is Nettest, available at nettest was used to generate some performance statistics for vBNS, the very high-performance backbone network service (see


bing attempts to measure bandwidth between two points on the Internet. See


The chargen service, defined in RFC 864 and implemented by most versions of Unix, simply sends back nonsense characters to the user at the maximum possible rate. This can be used along with some measuring mechanism to determine what that maximum rate is. The TCP form of the service sends a continuous stream, while the UDP form sends a packet of random size for each packet received. Both run on well-known port 19. chargen does not give reliable readings because it cannot distinguish between packets that were dropped on the sending machine from packets dropped at the receiving machine due to buffer overflows.


NetSpec simplifies network testing by allowing users to control processes across multiple hosts using a set of daemons. It can be found at

Get Web Performance Tuning, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.