In the last chapter, we attempted to answer a fundamental question, “Do we have a working network connection?” We used tools such as ping to verify basic connectivity. But simple connectivity is not enough for many purposes. For example, an ISP can provide connectivity but not meet your needs or expectations. If your ISP is not providing the level of service you think it should, you will need something to base your complaints on. Or, if the performance of your local network isn’t adequate, you will want to determine where the bottlenecks are located before you start implementing expensive upgrades. In this chapter, we will try to answer the question, “Is our connection performing reasonably?”
We will begin by looking at ways to determine which links or individual connections compose a path. This discussion focuses on the tool traceroute. Next, we will turn to several tools that allow us to identify those links along a path that might cause problems. Once we have identified individual links of interest, we will examine some simple ways to further characterize the performance of those links, including estimating the bandwidth of a connection and measuring the available throughput.
This section describes traceroute, a tool used to discover the links along a path. While this is the first step in investigating a path’s behavior and performance, it is useful for other tasks as well. In the previous discussion of ping, it was suggested that you work your way, hop by hop, toward a device you can’t reach to discover the point of failure. This assumes that you know the path.
Path discovery is also an essential step in diagnosing routing problems. While you may fully understand the structure of your network and know what path you want your packets to take through your network, knowing the path your packets actually take is essential information and may come as a surprise.
Once packets leave your network, you have almost no control over the path they actually take to their destination. You may know very little about the structure of adjacent networks. Path discovery can provide a way to discover who their ISP is, how your ISP is connected to the world, and other information such as peering arrangements. traceroute is the tool of choice for collecting this kind of information.
The traceroute program was written by Van Jacobson and others. It is based on a clever use of the Time-To-Live (TTL) field in the IP packet’s header. The TTL field, described briefly in the last chapter, is used to limit the life of a packet. When a router fails or is misconfigured, a routing loop or circular path may result. The TTL field prevents packets from remaining on a network indefinitely should such a routing loop occur. A packet’s TTL field is decremented each time the packet crosses a router on its way through a network. When its value reaches 0, the packet is discarded rather than forwarded. When discarded, an ICMP TIME_EXCEEDED message is sent back to the packet’s source to inform the source that the packet was discarded. By manipulating the TTL field of the original packet, the program traceroute uses information from these ICMP messages to discover paths through a network.
traceroute sends a series of UDP packets with the destination address of the device you want a path to. By default, traceroute sends sets of three packets to discover each hop. traceroute sets the TTL field in the first three packets to a value of 1 so that they are discarded by the first router on the path. When the ICMP TIME_EXCEEDED messages are returned by that router, traceroute records the source IP address of these ICMP messages. This is the IP address of the first hop on the route to the destination.
Next, three packets are sent with their TTL field set to 2. These will be discarded by the second router on the path. The ICMP messages returned by this router reveal the IP address of the second router on the path. The program proceeds in this manner until a set of packets finally has a TTL value large enough so that the packets reach their destination.
Typically, when the probe packets finally have an adequate TTL and reach their destination, they will be discarded and an ICMP PORT_UNREACHABLE message will be returned. This happens because traceroute sends all its probe packets with what should be invalid port numbers, i.e., port numbers that aren’t usually used. To do this, traceroute starts with a very large port number, typically 33434, and increments this value with each subsequent packet. Thus, each of the three packets in a set will have three different unlikely port numbers. The receipt of ICMP PORT_UNREACHABLE messages is the signal that the end of the path has been reached. Here is a simple example of using traceroute:
bsd1# traceroute 188.8.131.52 traceroute to 184.108.40.206 (220.127.116.11), 30 hops max, 40 byte packets 1 18.104.22.168 (22.214.171.124) 1.162 ms 1.068 ms 1.025 ms 2 cisco (126.96.36.199) 4.249 ms 4.275 ms 4.256 ms 3 188.8.131.52 (184.108.40.206) 4.433 ms 4.521 ms 4.450 ms 4 e0.r01.ia-gnwd.Infoave.Net (220.127.116.11) 5.178 ms 5.173 ms 5.140 ms 5 18.104.22.168 (22.214.171.124) 13.171 ms 13.277 ms 13.352 ms 6 126.96.36.199 (188.8.131.52) 18.395 ms 18.238 ms 18.210 ms 7 atm12-0-10-mp.r01.ia-clma.infoave.net (184.108.40.206) 18.816 ms 18.934 ms 18.893 ms 8 Serial5-1-1.GW1.RDU1.ALTER.NET (220.127.116.11) 26.658 ms 26.484 ms 26.855 ms 9 Fddi12-0-0.GW2.RDU1.ALTER.NET (18.104.22.168) 26.692 ms 26.697 ms 26.490 ms 10 smatnet-gw2.customer.ALTER.NET (22.214.171.124) 27.736 ms 28.101 ms 27.738 ms 11 rcmt1-S10-1-1.sprintsvc.net (126.96.36.199) 33.539 ms 33.219 ms 32.446 m s 12 rcmt3-FE0-0.sprintsvc.net (188.8.131.52) 32.641 ms 32.724 ms 32.898 ms 13 gwd1-S3-7.sprintsvc.net (184.108.40.206) 46.026 ms 50.724 ms 45.960 ms 14 gateway.ais-gwd.com (220.127.116.11) 47.828 ms 50.912 ms 47.823 ms 15 pm3-02.ais-gwd.com (18.104.22.168) 63.786 ms 48.432 ms 48.113 ms 16 user58.ais-gwd.com (22.214.171.124) 200.910 ms 184.587 ms 202.771 ms
The results should be fairly self-explanatory. This particular path was 16 hops long. Reverse name lookup is attempted for the IP address of each device, and, if successful, these names are reported in addition to IP addresses. Times are reported for each of the three probes sent. They are interpreted in the same way as times with ping. (However, if you just want times for one hop, ping is generally a better choice.)
Although no packets were lost in this example, should a packet be lost, an asterisk is printed in the place of the missing time. In some cases, all three times may be replaced with asterisks. This can happen for several reasons. First, the router at this hop may not return ICMP TIME_EXCEEDED messages. Second, some older routers may incorrectly forward packets even though the TTL is 0. A third possibility is that ICMP messages may be given low priority and may not be returned in a timely manner. Finally, beyond some point of the path, ICMP packets may be blocked.
Other routing problems may exist as well. In some
instances traceroute will append
additional messages to the end of lines in the form of an exclamation
point and a letter.
indicate, respectively, that the host, network, or protocol is
!F indicates that
fragmentation is needed.
a source route failure.
Two options control how much information is printed. Name resolution can be disabled with the -n option. This can be useful if name resolution fails for some reason or if you just don’t want to wait on it. The -v option is the verbose flag. With this flag set, the source and packet sizes of the probes will be reported for each packet. If other ICMP messages are received, they will also be reported, so this can be an important option when troubleshooting.
Several options may be used to alter the behavior of traceroute, but most are rarely needed. An example is the -m option. The TTL field is an 8-bit number allowing a maximum of 255 hops. Most implementations of traceroute default to trying only 30 hops before halting. The -m option can be used to change the maximum number of hops tested to any value up to 255.
As noted earlier, traceroute usually receives a PORT_UNREACHABLE message when it reaches its final destination because it uses a series of unusually large port numbers as the destination ports. Should the number actually match a port that has a running service, the PORT_UNREACHABLE message will not be returned. This is rarely a problem since three packets are sent with different port numbers, but, if it is, the -p option lets you specify a different starting port so these ports can be avoided.
Normally, traceroute sends three probe packets for each TTL value with a timeout of three seconds for replies. The default number of packets per set can be changed with the -q option. The default timeout can be changed with the -w option.
Additional options support how packets are routed. See the manpage for details on these if needed.
The information traceroute supplies has its limitations. In some situations, the results returned by traceroute have a very short shelf life. This is particularly true for long paths crossing several networks and ISPs.
You should also recall that a router, by definition, is a computer with multiple network interfaces, each with a different IP address. This raises an obvious question: which IP address should be returned for a router? For traceroute, the answer is dictated by the mechanism it uses to discover the route. It can report only the address of the interface receiving the packet. This means a quite different path will be reported if traceroute is run in the reverse direction.
Here is the output when the previous example is run again from what was originally the destination to what was originally the source, i.e., with the source and destination exchanged:
C:\>tracert 126.96.36.199 Tracing route to 188.8.131.52 over a maximum of 30 hops 1 132 ms 129 ms 129 ms pm3-02.ais-gwd.com [184.108.40.206] 2 137 ms 130 ms 129 ms sprint-cisco-01.ais-gwd.com [220.127.116.11] 3 136 ms 129 ms 139 ms 18.104.22.168 4 145 ms 150 ms 140 ms rcmt3-S4-5.sprintsvc.net [22.214.171.124] 5 155 ms 149 ms 149 ms sl-gw2-rly-5-0-0.sprintlink.net [126.96.36.199] 6 165 ms 149 ms 149 ms sl-bb11-rly-2-1.sprintlink.net [188.8.131.52] 7 465 ms 449 ms 399 ms sl-gw11-dc-8-0-0.sprintlink.net [184.108.40.206] 8 155 ms 159 ms 159 ms sl-infonet-2-0-0-T3.sprintlink.net [220.127.116.11] 9 164 ms 159 ms 159 ms atm4-0-10-mp.r01.ia-gnvl.infoave.net [18.104.22.168] 10 164 ms 169 ms 169 ms atm4-0-30.r1.scgnvl.infoave.net [22.214.171.124] 11 175 ms 179 ms 179 ms 126.96.36.199 12 184 ms 189 ms 195 ms e0.r02.ia-gnwd.Infoave.Net [188.8.131.52] 13 190 ms 179 ms 180 ms 184.108.40.206 14 185 ms 179 ms 179 ms 220.127.116.11 15 174 ms 179 ms 179 ms 18.104.22.168 Trace complete.
There are several obvious differences. First, the format is slightly different because this example was run using Microsoft’s implementation of traceroute, tracert. This, however, should present no difficulty.
A closer examination shows that there are more fundamental differences. The second trace is not simply the first trace in reverse order. The IP addresses are not the same, and the number of hops is different.
There are two things going on here. First, as previously mentioned, traceroute reports the IP number of the interface where the packet arrives. The reverse path will use different interfaces on each router, so different IP addresses will be reported. While this can be a bit confusing at first glance, it can be useful. By running traceroute at each end of a connection, a much more complete picture of the connection can be created.
Figure 4-1 shows the first six hops on the path starting from the source for the first trace as reconstructed from the pair of traces. We know the packet originates at 22.214.171.124. The first trace shows us the first hop is 126.96.36.199. It leaves this router on interface 188.8.131.52 for 184.108.40.206. The second of these addresses is just the next hop in the first trace. The first address comes from the second trace. It is the last hop before the destination. It is also reasonable in that we have two addresses that are part of the same class C network. With IP networks, the ends of a link are part of the link and must have IP numbers consistent with a single network.
From the first trace, we know packets go from the 220.127.116.11 to 18.104.22.168. From the reverse trace, we are able to deduce that the other end of the 22.214.171.124 link is 126.96.36.199. Or, equivalently, the outbound interface for the 188.8.131.52 router has the address 184.108.40.206.
In the same manner, the next router’s inbound interface is 220.127.116.11, and its outbound interface is 18.104.22.168. This can be a little confusing since it appears that these last three addresses should be on the same network. On closer examination of this link and adjacent links, it appears that this class B address is using a subnet mask of /20. With this assumption, the addresses are consistent.
We can proceed in much the same manner to discover the next few links. However, when we get to the seventh entry in the first trace (or to the eighth entry working backward in the second trace), the process breaks down. The reason is simple—we have asymmetric paths across the Internet. This also accounts for the difference in the number of hops between the two traces.
In much the same way we mapped the near end of the path, the remote end can be reconstructed as well. The paths become asymmetric at the seventh router when working in this direction. Figure 4-2 shows the first four hops. We could probably fill in the remaining addresses for each direction by running traceroute to the specific machine where the route breaks down, but this probably isn’t worth the effort.
One possible surprise in Figure 4-2 is that we have the same IP number, 22.214.171.124, on each interface at the first hop. The explanation is that dial-in access is being used. The IP number 126.96.36.199 is assigned to the host when the connection is made. 188.8.131.52 must be the access router. This numbering scheme is normal for an access router.
Although we haven’t constructed a complete picture of the path(s) between these two computers, we have laid out the basic connection to our network through our ISP. This is worth working out well in advance of any problems. When you suspect problems, you can easily ping these intermediate routers to pinpoint the exact location of a problem. This will tell you whether it is your problem or your ISP’s problem. This can also be nice information to have when you call your ISP.
To construct the bidirectional path using the technique just described, you need access to a second, remote computer on the Internet from which you can run traceroute. Fortunately, this is not a problem. There are a number of sites on the Internet, which, as a service to the network community, will run traceroute for you. Often called looking glasses, such sites can provide a number of other services as well. For example, you may be able to test how accessible your local DNS setup is by observing how well traceroute works. A list of such sites can be found at http://www.traceroute.org. Alternately, the search string “web traceroute” or “traceroute looking glass” will usually turn up a number of such sites with most search engines.
In theory, there is an alternative way to find this type of information with some implementations of traceroute. Some versions of traceroute support loose source routing, the ability to specify one or more intermediate hops that the packets must go through. This allows a packet to be diverted through a specific router on its way to its destination. (Strict source routing may also be available. This allows the user to specify an exact path through a network. While loose source routing can take any path that includes the specified hops, strict source routing must exactly follow the given path.)
To construct a detailed list of all devices on a path, the approach is to use traceroute to find a path from the source host to itself, specifying a route through a remote device. Packets leave the host with the remote device as their initial destination. When the packets arrive at the remote device, that device replaces the destination address with the source’s address, and the packets are redirected back to the source. Thus, you get a picture of the path both coming and going. (Of course, source routing is not limited to just this combination of addresses.)
At least, that is how it should work in theory. In practice, many devices no longer support source routing. Unfortunately, source routing has been used in IP spoofing attacks. Packets sent with a spoofed source address can be diverted so they pass through the spoofed device’s network. This approach will sometimes slip packets past firewalls since the packet seems to be coming from the right place.
This is shown in Figure 4-3. Without source routing, the packet would come into the firewall on the wrong interface and be discarded. With source routing, the packet arrives on the correct interface and passes through the firewall. Because of problems like this, source routing is frequently disabled.
One final word of warning regarding traceroute—buggy or nonstandard implementations exist. Nonstandard isn’t necessarily bad; it just means you need to watch for differences. For example, see the discussion of tracert later in this chapter. Buggy implementations, however, can really mislead you.
Once you have a picture of the path your traffic is taking, the next step in testing is to get some basic performance numbers. Evaluating path performance will mean doing three types of measurements. Bandwidth measurements will give you an idea of the hardware capabilities of your network, such as the maximum capacity of your network. Throughput measurements will help you discover what capacity your network provides in practice, i.e., how much of the maximum is actually available. Traffic measurements will give you an idea of how the capacity is being used.
My goal in this section is not a definitive analysis of performance. Rather, I describe ways to collect some general numbers that can be used to see if you have a reasonable level of performance or if you need to delve deeper. If you want to go beyond the quick-and-dirty approaches described here, you might consider some of the more advanced tools described in Chapter 9. The tools mentioned here should help you focus your efforts.
Two factors determine how long it takes to send a packet or frame across a single link. The amount of time it takes to put the signal onto the cable is known as the transmission time or transmission delay. This will depend on the transmission rate (or interface speed) and the size of the frame. The amount of time it takes for the signal to travel across the cable is known as the propagation time or propagation delay. Propagation time is determined by the type of media used and the distance involved. It often comes as a surprise that a signal transmitted at 100 Mbps will have the same propagation delay as a signal transmitted at 10 Mbps. The first signal is being transmitted 10 times as fast, but, once it is on a cable, it doesn’t propagate any faster. That is, the difference between 10 Mbps and 100 Mbps is not the speed the bits travel, but the length of the bits.
Once we move to multihop paths, a third consideration enters the picture—the delay introduced from processing packets at intermediate devices such as routers and switches. This is usually called the queuing delay since, for the most part, it arises from the time packets spend in queues within the device. The total delay in delivering a packet is the sum of these three delays. Transmission and propagation delays are usually quite predictable and stable. Queuing delays, however, can introduce considerable variability.
The term bandwidth is typically used to describe the capacity of a link. For our purposes, this is the transmission rate for the link. If we can transmit onto a link at 10 Mbps, then we say we have a bandwidth of 10 Mbps.
Throughput is a measure of the amount of data that can be sent over a link in a given amount of time. Throughput estimates, typically obtained through measurements based on the bulk transfer of data, are usually expressed in bits per second or packets per second. Throughput is frequently used as an estimate of the bandwidth of a network, but bandwidth and throughput are really two different things. Throughput measurement may be affected by considerable overhead that is not included in bandwidth measurements. Consequently, throughput is a more realistic estimator of the actual performance you will see.
Throughput is generally an end-to-end measurement. When dealing with multihop paths, however, the bandwidths may vary from link to link. The bottleneck bandwidth is the bandwidth of the slowest link on a path, i.e., the link with the lowest bandwidth. (While introduced here, bottleneck analysis is discussed in greater detail in Chapter 12.)
Additional metrics will sometimes be needed. The best choice is usually task dependent. If you are sending real-time audio packets over a long link, you may want to minimize both delay and variability in the delay. If you are using FTP to do bulk transfers, you may be more concerned with the throughput. If you are evaluating the quality of your link to the Internet, you may want to look at bottleneck bandwidth for the path. The development of reliable metrics is an active area of research.
The preceding discussion should make clear that the times returned by ping, although frequently described as propagation delays, really are the sum of the transmission, propagation, and queuing delays. In the last chapter, we used ping to calculate a rough estimate of the bandwidth of a connection and noted that this treatment is limited since it gives a composite number.
We can refine this process and use it to estimate the bandwidth for a link along a path. The basic idea is to first calculate the path behavior up to the device on the closest end of the link and then calculate the path behavior to the device at the far end of the link. The difference is then used to estimate the bandwidth for the link in question. Figure 4-4 shows the basic arrangement.
This process requires using ping four times. First, ping the near end of a link with two different packet sizes. The difference in the times will eliminate the propagation and queuing delays along the path (assuming they haven’t changed too much) leaving the time required to transmit the additional data in the larger packet. Next, use the same two packet sizes to ping the far end of the link. The difference in the times will again eliminate the overhead. Finally, the difference in these two differences will be the amount of time to send the additional data over the last link in the path. This is the round-trip time. Divide this number by two and you have the time required to send the additional data in one direction over the link. The bandwidth is simply the amount of additional data sent divided by this last calculated time. 
Table 4-1. Raw data
Time for 100 bytes
Time for 1100 bytes
Table 4-2 shows the calculated results. The time difference was divided by two (RRT correction), then divided into 8000 bits (the size of the data in bits), and then multiplied by 1000 (milliseconds-to-seconds correction.). The results, in bps, were then converted to Mbps. If several sets of packets are sent, the minimums of the times can be used to improve the estimate.
Table 4-2. Calculated bandwidth
Clearly, doing this manually is confusing, tedious, and prone to errors. Fortunately, several tools based on this approach greatly simplify the process. These tools also improve accuracy by using multiple packets.
One tool that automates this process is pathchar. This tool, written by Van Jacobson several years ago, seems to be in a state of limbo. It has, for several years, been available as an alpha release, but nothing seems to have been released since. Several sets of notes or draft notes are available on the Web, but there appears to be no manpage for the program. Nonetheless, the program remains available and has been ported to several platforms. Fortunately, a couple of alternative implementations of the program have recently become available. These include bing, pchar, clink, and tmetric.
One strength of pathchar and its variants is that they can discover the bandwidth of each link along a path using software at only one end of the path. The method used is basically that described earlier for ping, but pathchar uses a large number of packets of various sizes. Here is an example of running pathchar :
bsd1# pathchar 184.108.40.206 pathchar to 220.127.116.11 (18.104.22.168) mtu limited to 1500 bytes at local host doing 32 probes at each of 45 sizes (64 to 1500 by 32) 0 22.214.171.124 (126.96.36.199) | 4.3 Mb/s, 1.55 ms (5.88 ms) 1 cisco (188.8.131.52) | 1.5 Mb/s, -144 us (13.5 ms) 2 184.108.40.206 (220.127.116.11) | 10 Mb/s, 242 us (15.2 ms) 3 e0.r01.ia-gnwd.Infoave.Net (18.104.22.168) | 1.2 Mb/s, 3.86 ms (32.7 ms) 4 22.214.171.124 (126.96.36.199) | ?? b/s, 2.56 ms (37.7 ms) 5 188.8.131.52 (184.108.40.206) | 45 Mb/s, 1.85 ms (41.6 ms), +q 3.20 ms (18.1 KB) *4 6 atm1-0-5.r01.ncchrl.infoave.net (220.127.116.11) | 17 Mb/s, 0.94 ms (44.3 ms), +q 5.83 ms (12.1 KB) *2 7 h10-1-0.r01.ia-chrl.infoave.net (18.104.22.168) | ?? b/s, 89 us (44.3 ms), 1% dropped 8 dns1.InfoAve.Net (22.214.171.124) 8 hops, rtt 21.9 ms (44.3 ms), bottleneck 1.2 Mb/s, pipe 10372 bytes
As pathchar runs, it first displays a message describing how the probing will be done. From the third line of output, we see that pathchar is using 45 different packet sizes ranging from 64 to 1500 bytes. (1500 is the local host’s MTU.) It uses 32 different sets of these packets for each hop. Thus, this eight-hop run generated 11,520 test packets plus an equal number of replies.
The bandwidth and delay for each link is given. pathchar may also include information on the queuing delay (links 5 and 6 in this example). As you can see, pathchar is not always successful in estimating the bandwidth (see the links numbered 4 and 7) or the delay (see link numbered 1). With this information, we could go back to Figure 4-1 and fill in link speeds for most links.
As pathchar runs, it shows a countdown as it sends out each packet. It will display a line that looks something like this:
1: 31 288 0 3
1: refers to
the hop count and will be incremented for each successive hop on the
path. The next number counts down, giving the number of sets of
probes remaining to be run for this link. The third number is the
size of the current packet being sent. Both the second and third
numbers should be changing rapidly. The last two numbers give the
number of packets that have been dropped so far on this link and the
average round-trip time for this link.
When the probes for a hop are complete, this line is replaced with a line giving the bandwidth, incremental propagation delay, and round-trip time. pathchar uses the minimum of the observed delays to improve its estimate of bandwidth.
Several options are available with pathchar. Of greatest interest are those that control the number and size of the probe packet used. The option -q allows the user to specify the number of sets of packets to send. The options -m and -M control the minimum and maximum packet sizes, respectively. The option -Q controls the step size from the smallest to largest packet sizes. As a general rule of thumb, more packets are required for greater accuracy, particularly on busy links. The option -n turns off DNS resolution, and the option -v provides for more output.
pathchar is not without problems. One problem for pathchar is hidden or unknown transmission points. The first link reports a bandwidth of 4.3 Mbps. From traceroute, we only know of the host and the router at the end of the link. This is actually a path across a switched LAN with three segments and two additional transmission points at the switches. The packet is transmitted onto a 10-Mbps network, then onto a 100-Mbps backbone, and then back onto a 10 Mbps network before reaching the first router. Consequently, there are three sets of transmission delays rather than just one, and a smaller than expected bandwidth is reported.
You will see this problem with store-and-forward switches, but it is not appreciable with cut-through switches. (Types of Switches if you are unfamiliar with the difference between cut-through and store-and-forward switches.) In a test in which another switch, configured for cut-through, was added to this network, almost no change was seen in the estimated bandwidth with pathchar. When the switch was reconfigured as a store-and-forward switch, the reported bandwidth on the first link dropped to 3.0 Mbps.
This creates a problem if you are evaluating an ISP. For example, it might appear that the fourth link is too slow if the contract specifies T1 service. This might be the case, but it could just be a case of a hidden transmission point. Without more information, this isn’t clear.
Finally, you should be extremely circumspect about running pathchar. It can generate a huge amount of traffic. The preceding run took about 40 minutes to complete. It was run from a host on a university campus while the campus was closed for Christmas break and largely deserted. If you are crossing a slow link and have a high path MTU, the amount of traffic can effectively swamp the link. Asymmetric routes, routes in which the path to a device is different from the path back, changing routes, links using tunneling, or links with additional padding added can all cause problems.
One alternative to pathchar is bing, a program written by Pierre Beyssac. Where pathchar gives the bandwidth for every link along a path, bing is designed to measure point-to-point bandwidth. Typically, you would run traceroute first if you don’t already know the links along a path. Then you would run bing specifying the near and far ends of the link of interest on the command line. This example measures the bandwidth of the third hop in Figure 4-1:
bsd1# bing -e10 -c1 126.96.36.199 188.8.131.52 BING 184.108.40.206 (220.127.116.11) and 18.104.22.168 (22.214.171.124) 44 and 108 data bytes 1024 bits in 0.835ms: 1226347bps, 0.000815ms per bit 1024 bits in 0.671ms: 1526080bps, 0.000655ms per bit 1024 bits in 0.664ms: 1542169bps, 0.000648ms per bit 1024 bits in 0.658ms: 1556231bps, 0.000643ms per bit 1024 bits in 0.627ms: 1633174bps, 0.000612ms per bit 1024 bits in 0.682ms: 1501466bps, 0.000666ms per bit 1024 bits in 0.685ms: 1494891bps, 0.000669ms per bit 1024 bits in 0.605ms: 1692562bps, 0.000591ms per bit 1024 bits in 0.618ms: 1656958bps, 0.000604ms per bit --- 126.96.36.199 statistics --- bytes out in dup loss rtt (ms): min avg max 44 10 10 0% 3.385 3.421 3.551 108 10 10 0% 3.638 3.684 3.762 --- 188.8.131.52 statistics --- bytes out in dup loss rtt (ms): min avg max 44 10 10 0% 3.926 3.986 4.050 108 10 10 0% 4.797 4.918 4.986 --- estimated link characteristics --- estimated throughput 1656958bps minimum delay per packet 0.116ms (192 bits) average statistics (experimental) : packet loss: small 0%, big 0%, total 0% average throughput 1528358bps average delay per packet 0.140ms (232 bits) weighted average throughput 1528358bps resetting after 10 samples.
The output begins with the addresses and packet sizes followed by lines for each pair of probes. Next, bing returns round-trip times and packet loss data. Finally, it returns several estimates of throughput.
In this particular example, we have specified the options -e10 and -c1, which limit the probe to one cycle using 10 pairs of packets. Alternatively, you can omit these options and watch the output. When the process seems to have stabilized, enter a Ctrl-C to terminate the program. The summary results will then be printed. Interpretation of these results should be self-explanatory.
bing allows for a number of fairly standard options. These options allow controlling the number of packet sizes, suppressing name resolution, controlling routing, and obtaining verbose output. See the manpage if you have need of these options.
Because bing uses the same mechanism as pathchar, it will suffer the same problems with hidden transmission points. Thus, you should be circumspect when using it if you don’t fully understand the topology of the network. While bing does not generate nearly as much traffic as pathchar, it can still place strains on a network.
One alternative approach that is useful for measuring bottleneck bandwidth is the packet pair or packet stretch approach. With this approach, two packets that are the same size are transmitted back-to-back. As they cross the network, whenever they come to a slower link, the second packet will have to wait while the first is being transmitted. This increases the time between the transmission of the packets at this point on the network. If the packets go onto another faster link, the separation is preserved. If the packets subsequently go onto a slower link, then the separation will increase. When the packets arrive at their destination, the bandwidth of the slowest link can be calculated from the amount of separation and the size of the packets.
It would appear that getting this method to work requires software at both ends of the link. In fact, some implementations of packet pair software work this way. However, using software at both ends is not absolutely necessary since the acknowledgment packets provided with some protocols should preserve the separation.
One assumption of this algorithm is that packets will stay together as they move through the network. If other packets are queued between the two packets, the separation will increase. To avoid this problem, a number of packet pairs are sent through the network with the assumption that at least one pair will stay together. This will be the pair with the minimum separation.
Several implementations of this algorithm exist. bprobe and cprobe are two examples. At the time this was written, these were available only for the IRIX operating system on SGI computers. Since the source code is available, this may have changed by the time you read this.
Compared to the pathchar approach, the packet pair approach will find only the bottleneck bandwidth rather than the bandwidth of an arbitrary link. However, it does not suffer from the hidden hop problem. Nor does it create the levels of traffic characteristic of pathchar. This is a technology to watch.
Estimating bandwidth can provide a quick overview of hardware performance. But if your bandwidth is not adequate, you are limited in what you can actually do—install faster hardware or contract for faster service. In practice, it is often not the raw bandwidth of the network but the bandwidth that is actually available that is of interest. That is, you may be more interested in the throughput that you can actually achieve.
Poor throughput can result not only from inadequate hardware but also from architectural issues such as network design. For example, a broadcast domain that is too large will create problems despite otherwise adequate hardware. The solution is to redesign your network, breaking apart or segmenting such domains once you have a clear understanding of traffic patterns.
Equipment configuration errors may also cause poor performance. For example, some Ethernet devices may support full duplex communication if correctly configured but will fall back to half duplex otherwise. The first step toward a solution is recognizing the misconfiguration. Throughput tests are the next logical step in examining your network.
Throughput is typically measured by timing the transfer of a large block of data. This may be called the bulk transfer capacity of the link. There are a number of programs in this class besides those described here. The approach typically requires software at each end of the link. Because the software usually works at the application level, it tests not only the network but also your hardware and software at the endpoints.
Since performance depends on several parts, when you identify that a problem exists, you won’t immediately know where the problem is. Initially, you might try switching to a different set of machines with different implementations to localize the problem. Before you get too caught up in your testing, you’ll want to look at the makeup of the actual traffic as described later in this chapter. In extreme cases, you may need some of the more advanced tools described later in this book.
One simple quick-and-dirty test is to use an application like FTP. Transfer a file with FTP and see what numbers it reports. You’ll need to convert these to a bit rate, but that is straightforward. For example, here is the final line for a file transfer:
1294522 bytes received in 1.44 secs (8.8e+02 Kbytes/sec)
Convert 1,294,522 bytes to bits by multiplying by 8 and then dividing by the time, 1.44 seconds. This gives about 7,191,789 bps.
One problem with this approach is that the disk accesses required may skew your results. There are a few tricks you can use to reduce this, but if you need the added accuracy, you are better off using a tool that is designed to deal with such a problem. ttcp, for example, overcomes the disk access problem by repeatedly sending the same data from memory so that there is no disk overhead.
One of the oldest bulk capacity measurement tools is ttcp. This was written by Mike Muuss and Terry Slattery. To run the program, you first need to start the server on the remote machine using, typically, the -r and -s options. Then the client is started with the options -t and -s and the hostname or address of the server. Data is sent from the client to the server, performance is measured, the results are reported at each end, and then both client and server terminate. For example, the server might look something like this:
bsd2# ttcp -r -s ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp ttcp-r: socket ttcp-r: accept from 184.108.40.206 ttcp-r: 16777216 bytes in 18.35 real seconds = 892.71 KB/sec +++ ttcp-r: 11483 I/O calls, msec/call = 1.64, calls/sec = 625.67 ttcp-r: 0.0user 0.9sys 0:18real 5% 15i+291d 176maxrss 0+2pf 11478+28csw
The client side would look like this:
bsd1# ttcp -t -s 220.127.116.11 ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp -> 18.104.22.168 ttcp-t: socket ttcp-t: connect ttcp-t: 16777216 bytes in 18.34 real seconds = 893.26 KB/sec +++ ttcp-t: 2048 I/O calls, msec/call = 9.17, calls/sec = 111.66 ttcp-t: 0.0user 0.5sys 0:18real 2% 16i+305d 176maxrss 0+2pf 3397+7csw
The program reports the amount of information transferred, indicates that the connection is being made, and then gives the results, including raw data, throughput, I/O call information, and execution times. The number of greatest interest is the transfer rate, 892.71 KB/sec (or 893.26 KB/sec). This is about 7.3 Mbps, which is reasonable for a 10-Mbps Ethernet connection. (But it is not very different from our quick-and-dirty estimate with FTP.)
These numbers reflect the rate at which data is transferred, not the raw capacity of the line. Relating these numbers to bandwidth is problematic since more bits are actually being transferred than these numbers would indicate. The program reports sending 16,777,216 bytes in 18.35 seconds, but this is just the data. On Ethernet with an MTU of 1500, each buffer will be broken into 6 frames. The first will carry an IP and TCP header for 40 more bytes. Each of the other 5 will have an IP header for 20 more bytes each. And each will be packaged as an Ethernet frame costing an additional 18 bytes each. And don’t forget the Ethernet preamble. All this additional overhead should be included in a calculation of raw capacity.
Poor throughput numbers typically indicate congestion but that may not always be the case. Throughput will also depend on configuration issues such as the TCP window size for your connection. If your window size is not adequate, it will drastically affect performance. Unfortunately, this problem is not uncommon for older systems on today’s high-speed links.
The -u option allows you to check UDP throughput. A number of options give you some control over the amount and the makeup of the information transferred. If you omit the -s option, the program uses standard input and output. This option allows you to control the data being sent.
The nice thing about ttcp is that a number of implementations are readily available. For example, it is included as an undocumented command in the Enterprise version of Cisco IOS 11.2 and later. At one time, a Java version of ttcp was freely available from Chesapeake Computer Consultants, Inc., (now part of Mentor Technologies, Inc.). This program would run on anything with a Java interpreter including Windows machines. The Java version supported both a Windows and a command-line interface. Unfortunately, this version does not seem to be available anymore, but you might want to try tracking down a copy.
Another program to consider is netperf, which had its origin in the Information Networks Division of Hewlett-Packard. While not formally supported, the program does appear to have informal support. It is freely available, runs on a number of Unix platforms, and has reasonable documentation. It has also been ported to Windows. While not as ubiquitous as ttcp, it supports a much wider range of tests.
Unlike with ttcp, the client and server are two separate programs. The server is netserver and can be started independently or via inetd. The client is known as netperf. In the following example, the server and client are started on the same machine:
bsd1# netserver Starting netserver at port 12865 bsd1# netperf TCP STREAM TEST to localhost : histogram Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 16384 16384 16384 10.00 326.10
This tests the loop-back interface, which reports a throughput of 326 Mbps.
In the next example, netserver is started on one host:
bsd1# netserver Starting netserver at port 12865
Then netperf is run with the -H option to specify the address of the server:
bsd2# netperf -H 22.214.171.124 TCP STREAM TEST to 126.96.36.199 : histogram Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 16384 16384 16384 10.01 6.86
This is roughly the same throughput we saw with ttcp. netperf performs a number of additional tests. In the next test, the transaction rate of a connection is measured:
bsd2# netperf -H 188.8.131.52 -tTCP_RR TCP REQUEST/RESPONSE TEST to 184.108.40.206 : histogram Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 16384 1 1 10.00 655.84 16384 16384
The program contains several scripts for testing. It is also possible to do various stream tests with netperf. See the document that accompanies the program if you have these needs.
If ttcp and netperf don’t meet your needs, you might consider iperf. iperf comes from the National Laboratory for Applied Network Research (NLANR) and is a very versatile tool. While beyond the scope of this chapter, iperf can also be used to test UDP bandwidth, loss, and jitter. A Java frontend is included to make iperf easier to use. This utility has also been ported to Windows.
Here is an example of running the server side of iperf on a FreeBSD system:
bsd2# iperf -s -p3000 ------------------------------------------------------------ Server listening on TCP port 3000 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 4] local 172.16.2.236 port 3000 connected with 220.127.116.11 port 1133 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 5.6 MBytes 4.5 Mbits/sec ^C
Here is the client side under Windows:
C:\>iperf -c18.104.22.168 -p3000 ------------------------------------------------------------ Client connecting to 22.214.171.124, TCP port 3000 TCP window size: 8.0 KByte (default) ------------------------------------------------------------ [ 28] local 126.96.36.199 port 1133 connected with 188.8.131.52 port 3000 [ ID] Interval Transfer Bandwidth [ 28] 0.0-10.0 sec 5.6 MBytes 4.5 Mbits/sec
Notice the use of Ctrl-C to terminate the server side. In TCP mode, iperf is compatible with ttcp so it can be used as the client or server.
iperf is a particularly convenient tool for investigating whether your TCP window is adequate. The -w option sets the socket buffer size. For TCP, this is the window size. Using the -w option, you can step through various window sizes and see how they impact throughput. iperf has a number of other strengths that make it worth considering.
bsd2# treno 184.108.40.206 MTU=8166 MTU=4352 MTU=2002 MTU=1492 .......... Replies were from sloan.lander.edu [220.127.116.11] Average rate: 3868.14 kbp/s (3380 pkts in + 42 lost = 1.2%) in 10.07 s Equilibrium rate: 0 kbp/s (0 pkts in + 0 lost = 0%) in 0 s Path properties: min RTT was 13.58 ms, path MTU was 1440 bytes XXX Calibration checks are still under construction, use -v
treno is part of a larger Internet traffic measurement project at NLANR. treno servers are scattered across the Internet.
In the ideal network, throughput numbers, once you account for overhead, will be fairly close to your bandwidth numbers. But few of us have our networks all to ourselves. When throughput numbers are lower than expected, which is usually the case, you’ll want to account for the difference. As mentioned before, this could be hardware or software related. But usually it is just the result of the other traffic on your network. If you are uncertain of the cause, the next step is to look at the traffic on your network.
There are three basic approaches you can take. First, the quickest way to get a summary of the activity on a link is to use a tool such as netstat. This approach is described here. Or you can use packet capture to look at traffic. This approach is described in Chapter 5. Finally, you could use SNMP-based tools like ntop. SNMP tools are described in Chapter 7. Performance analysis tools using SNMP are described in Chapter 8.
The program netstat was introduced in Chapter 2. Given that netstat’s role is to report network data structures, it should come as no surprise that it might be useful in this context. To get a quick picture of the traffic on a network, use the -i option. For example:
bsd2# netstat -i Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lp0* 1500 <Link> 0 0 0 0 0 ep0 1500 <Link> 00.60.97.06.22.22 13971293 0 1223799 1 0 ep0 1500 205.153.63 bsd2 13971293 0 1223799 1 0 tun0* 1500 <Link> 0 0 0 0 0 sl0* 552 <Link> 0 0 0 0 0 ppp0* 1500 <Link> 0 0 0 0 0 lo0 16384 <Link> 234 0 234 0 0 lo0 16384 127 localhost 234 0 234 0 0
The output shows the number of packets processed for
each interface since the last reboot. In this example, interface
ep0 has received 13,971,293
Ipkts) with no errors
Ierrs), has sent 1,223,799 packets
Opkts) with 1 error (
Oerrs), and has experienced no collisions
Coll). A few errors are generally
not a cause for alarm, but the percentage of either error should be
quite low, certainly much lower than 0.1% of the total packets.
Collisions can be higher but should be less than 10% of the traffic.
The collision count includes only those involving the interface. A
high number of collisions is an indication that your network is too
heavily loaded, and you should consider segmentation. This particular
computer is on a switch, which explains the absence of collision.
Collisions are seen only on shared media.
bsd2# netstat -Iep0 Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ep0 1500 <Link> 00.60.97.06.22.22 13971838 0 1223818 1 0 ep0 1500 205.153.63 bsd2 13971838 0 1223818 1 0
(This was run a couple of minutes later so the numbers are slightly larger.)
lnx1# netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 7366003 0 0 0 93092 0 0 0 BMRU eth1 1500 0 289211 0 0 0 18581 0 0 0 BRU lo 3924 0 123 0 0 0 123 0 0 0 LRU
As you can see, Linux breaks down lost packets into three categories—errors, drops, and overruns.
Unfortunately, the numbers netstat returns are cumulative from the last reboot of the system. What is really of interest is how these numbers have changed recently, since a problem could develop and it would take a considerable amount of time before the actual numbers would grow enough to reveal the problem.
One thing you may want to try is stressing the system in question to see if this increases the number of errors you see. You can use either ping with the -l option or the spray command. (spray is discussed in greater detail in Chapter 9.)
First, run netstat to get a current set of values:
bsd2# netstat -Iep0 Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ep0 1500 <Link> 00.60.97.06.22.22 13978296 0 1228137 1 0 ep0 1500 205.153.63 bsd2 13978296 0 1228137 1 0
Next, send a large number of packets to the destination. In this example, 1000 UDP packets were sent:
bsd1# spray -c1000 18.104.22.168 sending 1000 packets of lnth 86 to 22.214.171.124 ... in 0.09 seconds elapsed time 464 packets (46.40%) dropped Sent: 11267 packets/sec, 946.3K bytes/sec Rcvd: 6039 packets/sec, 507.2K bytes/sec
Notice that this exceeded the capacity of the network as 464 packets were dropped. This may indicate a congested network. More likely, the host is trying to communicate with a slower machine. When spray is run in the reverse direction, no packets are dropped. This indicates the latter explanation. Remember, spray is sending packets as fast as it can, so don’t make too much out of dropped packets.
Finally, rerun nestat to see if any problems exist:
bsd2# netstat -Iep0 Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ep0 1500 <Link> 00.60.97.06.22.22 13978964 0 1228156 1 0 ep0 1500 205.153.63 bsd2 13978964 0 1228156 1 0
No problems are apparent in this example.
If problems are indicated, you can get a much more detailed report with the -s option. You’ll probably want to pipe the output to more so it doesn’t disappear off the top of the screen. The amount of output data can be intimidating but can give a wealth of information. The information is broken down by protocol and by error types such as bad checksums or incomplete headers.
bsd2# netstat -s -s ip: 255 total packets received 255 packets for this host 114 packets sent from this host icmp: ICMP address mask responses are disabled igmp: tcp: 107 packets sent 81 data packets (8272 bytes) 26 ack-only packets (25 delayed) 140 packets received 77 acks (for 8271 bytes) 86 packets (153 bytes) received in-sequence 1 connection accept 1 connection established (including accepts) 77 segments updated rtt (of 78 attempts) 2 correct ACK header predictions 62 correct data packet header predictions udp: 115 datagrams received 108 broadcast/multicast datagrams dropped due to no socket 7 delivered 7 datagrams output
A summary for a single protocol can be obtained with the -p option to specify the protocol. The next example shows the nonzero statistics for TCP:
bsd2# netstat -p tcp -s -s tcp: 147 packets sent 121 data packets (10513 bytes) 26 ack-only packets (25 delayed) 205 packets received 116 acks (for 10512 bytes) 122 packets (191 bytes) received in-sequence 1 connection accept 1 connection established (including accepts) 116 segments updated rtt (of 117 attempts) 2 correct ACK header predictions 88 correct data packet header predictions
This can take a bit of experience to interpret. Begin by looking for statistics showing a large number of errors. Next, identify the type of errors. Typically, input errors are caused by faulty hardware. Output errors are a problem on or at the local host. Data corruption, such as faulty checksums, frequently occurs at routers. And, as noted before, congestion is indicated by collisions. Of course, these are generalizations, so don’t read too much into them.
Most of the tools we have been discussing are available in one form or another for Windows platforms. Microsoft’s implementation of traceroute, known as tracert, has both superficial and fundamental differences from the original implementation. Like ping, tracert requires a DOS window to run. We have already seen an example of its output. tracert has fewer options, and there are some superficial differences in their flags. But most of traceroute’s options are rarely used anyway, so this isn’t much of a problem.
A more fundamental difference between Microsoft’s tracert and its Unix relative is that tracert uses ICMP packets rather than UDP packets. This isn’t necessarily bad, just different. In fact, if you have access to both traceroute and tracert, you may be able to use this to your advantage in some unusual circumstances. Its behavior may be surprising in some cases. One obvious implication is that routers that block ICMP messages will block tracert, while traceroute’s UDP packets will be passed.
As noted earlier in this chapter, Mentor’s Java implementation of ttcp runs under Windows if you can find it. Both netperf and iperf have also been ported to Windows. Another freely available program worth considering is Qcheck from Ganymede Software, Inc. This program requires that Ganymede’s Performance Endpoints software be installed on systems at each end of the link. This software is also provided at no cost and is available for a wide variety of systems ranging from Windows to MVS. In addition to supporting IP, the software supports SPX and IPX protocols. The software provides ping-like connectivity checks, as well as response time and throughput measurements.
As noted in Chapter 2, Microsoft also provides its own version of netstat. The options of interest here are -e and -s. The -e option gives a brief summary of activity on any Ethernet interface:
C:\>netstat -e Interface Statistics Received Sent Bytes 9840233 2475741 Unicast packets 15327 16414 Non-unicast packets 9268 174 Discards 0 0 Errors 0 0 Unknown protocols 969
The -s option gives the per-protocol statistics:
C:\>netstat -s IP Statistics Packets Received = 22070 Received Header Errors = 0 Received Address Errors = 6 Datagrams Forwarded = 0 Unknown Protocols Received = 0 Received Packets Discarded = 0 Received Packets Delivered = 22064 Output Requests = 16473 Routing Discards = 0 Discarded Output Packets = 0 Output Packet No Route = 0 Reassembly Required = 0 Reassembly Successful = 0 Reassembly Failures = 0 Datagrams Successfully Fragmented = 0 Datagrams Failing Fragmentation = 0 Fragments Created = 0 ICMP Statistics Received Sent Messages 20 8 Errors 0 0 Destination Unreachable 18 8 Time Exceeded 0 0 Parameter Problems 0 0 Source Quenchs 0 0 Redirects 0 0 Echos 0 0 Echo Replies 0 0 Timestamps 0 0 Timestamp Replies 0 0 Address Masks 0 0 Address Mask Replies 0 0 TCP Statistics Active Opens = 489 Passive Opens = 2 Failed Connection Attempts = 69 Reset Connections = 66 Current Connections = 4 Segments Received = 12548 Segments Sent = 13614 Segments Retransmitted = 134 UDP Statistics Datagrams Received = 8654 No Ports = 860 Receive Errors = 0 Datagrams Sent = 2717
Interpretation is basically the same as with the Unix version.
 tracert, a Windows variant of traceroute, uses ICMP rather than UDP. tracert is discussed later in this chapter.
 My apologies to any purist offended by my somewhat relaxed, pragmatic definition of bandwidth.
 The formula for bandwidth is BW = 16 x (Pl-Ps )/(t2 l-t2 s-t 1 l +t 1 s ). The larger and smaller packet sizes are Pl and Ps bytes, t 1 l and t 1 s are the ping times for the larger and smaller packets to the nearer interface in seconds, and t 2 l and t 2 s are the ping times for the larger and smaller packets to the distant interface in seconds. The result is in bits per second.
 The observant reader will notice that bing reported throughput, not bandwidth. Unfortunately, there is a lot of ambiguity and inconsistency surrounding these terms.
 In fact, ttcp can be
used to transfer files or directories between machines. At the
ttcp -r | tar xvpf
- and, at the source, use
| ttcp -t
 System Performance Tuning by Mike Loukides contains a script that can be run at regular intervals so that differences are more apparent.