The TCP/IP Suite

Having arrived here, you likely agree that Ethernet technology is the bee’s knees, so to speak, and is the be-all and end-all of all your local area networking needs. You have installed your shiny new EX switch, and you now have all those network ports just waiting for something to do.

And then it strikes you. All that revved-up LAN infrastructure is little more than a doorstop without upper-layer protocols to drive useful data over it. A LAN, with its Physical and Data Link layer services, provides you with a grand communications potential, but you can realize this potential only when there are applications written to operate over that technology.

There was a time when users were forced to choose one proprietary protocol suite over another; or as needs often dictated, when they ran multiple protocol suites to serve various user communities. Netware and its IPX/SPX had great file and print sharing, while Banyan Vines had a really cool directory service. Unix workstations and servers typically supported engineering communities with their Network File System (NFS), remote r commands, and related TCP/IP-based networking support. And who can forget AppleTalk and its DDP, often used for the graphic artist, or IBM’s SNA/SAA, often found transporting the business’s accounting and financial applications. Oh, and then along came Microsoft with its NetBIOS/NetBUI-enabled Windows for Workgroups solutions, which often found their way into ad hoc networks when they were not trying to compete with Novell. In the typical case, this litany of protocols would run on a multiprotocol backbone in a “ships in the night” fashion, passing each other by and never even being aware of the close encounters. Few hosts were bilingual, so most user communities shared a cable and little else. Protocol converters (remember those?) and application gateways were big business when users in different communities needed to interoperate.

Enter OSI

At one point, the grand saviors to this mess were—you guessed it—the OSI’s ISO models and related ITU/CCITT standards. The plan was to have the world switch to a standards-based open system, which would then promote wide-ranging connectivity and, just as important, would once and for all end all the multiprotocol babble the world of networking was mired in. Too bad OSI imploded.

What was left was pretty much what folks started with, which is to say some level of international standards that were actually in use and were intended to form the foundation of the larger OSI solution. This left people with circuit technologies such as dial-up modems, ISDN, and leased lines, and a few packet-based options such as LANs and X.25 for WAN use. There was one new thing, however, and that was the idea of what a widely adopted set of open standards could do. By replacing multiple stacks with one, costs could be reduced, support was simplified, and there was greater potential for interoperability.

Given that the OSI model was similar to the preexisting TCP/IP suite, that TCP/IP was known to work, and that it in fact did more every day than OSI accomplished in its whole life, a candidate came into focus. The fact that TCP/IP was well understood and widely deployed, was already supported on most operating systems, and was generally made available for free caused that candidate to again stand out. Once the U.S. government’s mandate of using only OSI protocols was exempted in the case of IP, until such time that equivalent functionality was available in OSI protocols, the writing was clear.

The world had its communications savior, and its name was Internet Protocol. Thus began the mantra of IP over everything, and everything over IP.

Exit OSI, Enter IP

As we ranted earlier, while all the OSI work was moving forward at its own snail-like pace, the fine folks at the IETF, the entity that generates Internet RFCs and drafts, kept on doing what was needed to solve actual issues and to create new functionality in the TCP/IP suite of protocols. A number of factors added up in IP’s favor, including the following:

The open nature of TCP/IP made the technology and its related specifications freely available. In contrast, OSI specifications cost money, and so did the resulting products, of which there were few.
TCP/IP was derived from early work on the DoD’s ARPANET, which went online in 1969! It was one of the first routed/packet-switched protocols ever implemented. TCP/IP’s original purpose was to avoid issues with the connection-oriented services of the day, which had a tendency to be disrupted every time some battle happened and bombs tore up phone lines. The TCP stack was intended to provide robust and resilient communications in battlefield conditions, and was known to work in this regard with a long field-proven history. I mean, who doesn’t want his Internet surfing at Amazon or his iTunes downloading to be robust and reliable, despite the near-battlefield conditions that comprise a modern distributed denial of service (DDoS) attack on large portions of the Internet’s infrastructure?
Because the U.S. government contracts were historically based on TCP/IP, and then later when OSI was mandated by virtue of the exception, for lack of OSI functionality, all OSs and products of any consequence had TCP/IP protocol support. If it was not already built in and simply waiting to be activated, you could certainly find a TCP stack for a given OS far more easily, and for far less money, than some OSI stack that would at best let you use a subset of the functionality on a far more limited set of machines.
Sir Timothy John Berners-Lee invented the World Wide Web (WWW) application, and suddenly every normal (non-computer-geek) person, and quite literally his or her mother, could see a use for the Internet that they would be willing to pay for. Things always seem to get real popular real quick whenever there’s money to be made.
As much as this dates me, I can remember the pre-WWW Internet. There was FTP from a command line, and there were command-line search tools such as the FTP site indexer Archie, which in short time begat the updated (and Gopher-based) search capabilities of Jughead and, ultimately, Veronica. Yep, back in the day we would walk barefoot through the snow to our 2,400 bps dial-up modems, and we would be happy for the chance to use a command-line FTP client to download some plain-text copy of an RFC. Nope, we did not need images, fancy multimedia, a GUI point-and-click interface, or the ability to view the contents of our shopping carts. Nope, back in the day, when I went shopping I had an actual shopping cart in my hands, and I could look at its contents without any fancy click buttons (I say in my best cranky old-man voice).
After the web browser and HTTP/HTML, the Internet and its underlying TCP/IP enabler became “the WWW.” It still irks me to hear news reports of “The WWW is under attack due to <insert latest cause here>,” then in reality it’s the Internet that is being attacked and the stupid WWW application is but one of many that are affected. No one but us aged geeks seems to care that during these outages we’re unable to use the finger application to determine how many cans of soda are left in some science lab’s vending machine! The Internet is more than just the WWW.

When the preceding points were factored, the choice was clear. The world already had a multivendor interoperable protocol, it was known to work, and money could now be made from it. TCP/IP became the de facto set of open standards while the official ones faded into that great night.

Vive IP!

The IP Stack, in a Nutshell

Once again we find ourselves broaching a subject that is extremely well documented, and found in numerous places. The wheel is not re-created here, as trees are far too valuable. Once again we try to stick to the facts and distill the most important and widely used set of protocols down to a few paragraphs. Wish me luck....

Figure 1-6 depicts the TCP/IP stack, along with some selected applications.

Figure 1-6. The Internet Protocol stack

Figure 1-6 also shows the good old OSI model alongside the IP stack. After all, being able to compare non-OSI things to its well-delineated layers is about the only useful thing left of that grand effort.

The network that lies beneath

As is so often the case, things begin in the physical realm, where the bits meet the media, so to speak. In the IP stack, both the Physical and the Link layers are combined into a single underlying Network layer. IP, which lives at Layer 3, is well shielded from the incredibly long list of supported technologies. For our purposes, we can place Ethernet technology here, allowing the Physical layer to be 10Base-2, GE over optical fiber, or whatever the current flavor of Ethernet happens to be. Likewise, the Link layer becomes CSMA/CD and the Ethernet MAC frame structure. For some flavors of Ethernet the MAC layer can dispense with collision detection (CD) and operate FD with no need for carrier sensing. In other modes both are still needed, as well as the binary exponential back-off algorithm used to recover from collisions.

To help drive home the significance of layering and the beauty of layer independence, consider that in all of these cases the same IP entity operates in the same manner, albeit sending less, or more, as the physical link speed dictates. All the details are handled by the related layers, such that IP simply sees a datagram transmission and reception service.

ARP me, Amadeus

Moving up, you next encounter Address Resolution Protocol (ARP). ARP is a critical component of IP’s independence from the underlying technology. Some technologies do not use link-level addressing, or are inherently P-to-P, and therefore do not require any dynamic binding of a destination IP address to a Link layer hardware address. In these cases, ARP is simply not used. Multipoint links, such as a LAN, represent a different story, as shown in Figure 1-7.

Figure 1-7. IP and ARP—they taste great together

Figure 1-7 shows two IP stations that share an Ethernet link. At step 1, the user or her application specified an IP address as the destination for some session.^[2]

The IP layer forms the resulting packet, but before it can be handed to the MAC layer for transmission, the correct destination MAC address must be specified. IP looks for a match in its ARP cache, where recent responses are stored. Here, no match is located, forcing it to evoke ARP’s services, shown as step 2 in the figure. The ARP request is broadcast at the MAC layer to get around the “cart before the horse” issue of needing to talk to a station to learn its MAC address before you actually know its MAC address.

In this example, the target is alive, and it sees the ARP request is intended for its local IP address. Its ARP reply contains its Ethernet MAC address, which completes the process. Usually this reply is sent as unicast back to the requester, but an unsolicited ARP reply can be broadcast to prepopulate ARP tables in what is known as Gratuitous ARP.

Layer 3 to Layer 2 address resolution is a must when there are multiple possible destinations on a single medium. “Close” does not count in the networking world.

IP, freely

Next up the chain is Internet Protocol (IP) itself. IP provides a datagram service to its upper layers. The term datagram implies a connectionless mode of operation and a resulting unreliable transport service. IP supports:

Identification of source and destination network addresses
A header checksum to detect corruption of its own header (not payload)
A ToS (Type of Service) indication
A precedence value to influence the probability of discard during congestion
Fields to support and control fragmentation (and later reassembly) of large datagrams to accommodate dissimilar maximum transmission units (MTUs)
A TTL field to limit effects of routing loops
A Protocol field to identify the owner (the upper layer protocol) that is responsible for the datagram’s payload
A Length field to accommodate padding, which is needed for Ethernet
Options, which when present alter packet handling (i.e., source routing and record route)

A lot of functions, to be sure, but again note that IP does not provide connection setup, flow control, payload error detection, or a simple discard response (with no retransmission) in the event of IP header errors, or the inability to process for any reason (e.g., lack of buffers due to congestion). IP leaves those functions to the users of its service: the upper layers. Some applications—for example, broadcast media—may not care much about error correction as they are negatively impacted by any retransmission attempt, whereas other applications—say, e-commerce—will be quite concerned with error control, or at least one would hope. IP provides the same services to both and lets the upper layers sort things out.

Note that some underlying network technologies—for example, X.25 or a Token Ring LAN running 802.2 LLC type 2—provide their own error detection and retransmission-based correction, which is another good reason to leave IP streamlined and built for routing. X.25 was the only Layer 2 protocol to also offer error correction. At least until DOCSIS came along!

Note

When links are error-prone, relying on end-to-end retransmissions, as is the case with a protocol such as Frame Relay, it quickly results in extreme throughput degradation, even when only tens of such links are involved. For this reason, error-correcting Link layers are still used in support of IP when error-prone transmission links are used.

To carry the previous example of IP and underlying network technology independence a bit further, consider that an IP packet can easily cross tens of links between endpoints. Consider this traceroute from Juniper Networks headquarters in Sunnyvale, California, to the official government website for the Central Asian nation of Uzbekistan:

bash-3.2$ traceroute www.gov.uz
traceroute to www.gov.uz (195.158.5.137), 30 hops max, 40 byte packets
 1  mrc2-core1-3.jnpr.net (172.24.28.2)  0.541 ms  0.517 ms  0.735 ms
 2  172.24.19.33 (172.24.19.33)  0.447 ms  0.438 ms  0.421 ms
 3  172.24.230.90 (172.24.230.90)  1.287 ms  1.274 ms  1.497 ms
 4  ns-egress-fw-vrrp.jnpr.net (172.24.254.6)  1.213 ms  1.196 ms  1.403 ms
 5  66.129.224.34 (66.129.224.34)  1.926 ms  1.911 ms  1.893 ms
 6  POS2-1.GW5.SJC2.ALTER.NET (208.214.142.9)  1.873 ms  1.675 ms  1.659 ms
 7  161.ATM4-0.XR2.SJC2.ALTER.NET (152.63.48.82)  2.325 ms  2.311 ms  2.293 ms
 8  0.so-1-0-0.XL2.SJC2.ALTER.NET (152.63.56.141)  2.391 ms  2.376 ms  2.357 ms
 9  0.ge-3-0-0.XT2.SCL2.ALTER.NET (152.63.49.110)  3.363 ms  3.324 ms  3.555 ms
10  sl-crs2-sj-0-1-0-1.sprintlink.net (144.232.9.1)  4.744 ms  4.733 ms  4.710 ms
11  sl-crs1-rly-0-4-2-0.sprintlink.net (144.232.20.187)  74.735 ms  70.007 ms
73.500 ms
12  sl-crs1-dc-0-8-0-0.sprintlink.net (144.232.19.213)  69.516 ms sl-crs1-dc-0-12-
2-0.sprintlink.net (144.232.19.223)  69.208 ms sl-crs1-dc-0-8-0-0.sprintlink.net
(144.232.19.213)  69.481 ms
13  sl-bb20-par-1-0-0.sprintlink.net (144.232.19.147)  159.589 ms  158.893 ms
159.475 ms
14  sl-bb21-fra-13-0-0.sprintlink.net (213.206.129.66)  159.159 ms  159.999 ms
159.981 ms
15  sl-gw10-fra-15-0-0.sprintlink.net (217.147.96.42)  174.860 ms  174.846 ms
174.828 ms
16  sl-MTU-I-278357-0.sprintlink.net (217.151.254.134)  160.784 ms  161.635 ms
160.751 ms
17  bor-cr01-po3.spb.stream-internet.net (195.34.53.101)  220.050 ms  220.278 ms
219.633 ms
18  m9-cr01-po4.msk.stream-internet.net (195.34.53.125)  210.818 ms  211.339 ms
210.590 ms
19  m9-cr02-po1.msk.stream-internet.net (195.34.59.54)  213.698 ms  213.475 ms
214.951 ms
20  synterra-m9.msk.stream-internet.net (195.34.38.38)  213.879 ms  213.869 ms
214.622 ms
21  83.229.225.243 (83.229.225.243)  230.080 ms  229.762 ms  229.750 ms
22  83.229.243.98 (83.229.243.98)  259.745 ms  260.105 ms  259.709 ms
23  195.69.188.148 (195.69.188.148)  282.930 ms  282.916 ms  283.508 ms
24  195.69.188.2 (195.69.188.2)  276.545 ms  276.534 ms  276.887 ms
25  84.54.64.66 (84.54.64.66)  277.306 ms  278.145 ms  278.347 ms
26  firewall.uzpak.uz (195.158.0.155)  283.783 ms  283.323 ms  282.876 ms
27  ta144-p86.uzpak.uz (195.158.10.181)  278.743 ms  276.536 ms  277.146 ms
28  195.158.4.42 (195.158.4.42)  276.802 ms  277.392 ms  276.766 ms
29  195.158.5.137 (195.158.5.137)  284.062 ms !X  284.042 ms !X  283.711 ms !X

Here, the results show that a fair number of hops are needed to reach the target site. As an upside, the presence of .uz domains near the target indicates that at least the website resides in a faraway and exotic land, as opposed to being hosted in some Silicon Valley-based company.

A key aspect of IP internetworking, and of routers in general, is that on each such link a completely different type of network and transmission technology can be used. The packet’s first hop might be over a modern LAN in sunny California, whereas a subsequent link may involve a jaunt in a Frame Relay frame transported via a trans-Atlantic SONET link that runs Point-to-Point Protocol (PPP). And later yet, as the IP packet nears its final hop and, likely, middle age, given that its TTL field has been decremented at each hop, the packet could jump into an X.25 packet, where it’s well protected for its arduous journey over an error-prone analog leased line the recipient uses for Internet access.

In summary, IP is a lot like the U.S. postal service, or any non-registered mail system for that matter. Your envelope indicates who the letter is from and where it’s going, using hierarchical addressing that permits information hiding, whereby more and more of the address becomes significant as the letter winds its way to the destination. Here, all the stuff up to and including the street address is like the IP address, and the recipient’s name is like a protocol identifier. You can have multiple protocols at the same IP, just as multiple people can dwell at the same address. You can pay more for airmail, or save with bulk, which is a reasonable analogy to IP’s ToS indication. There is some weight/size limit, and if in excess of this limit the package may have to be split into smaller parts (fragmentation). Although no doubt diligent in the face of snow and all that, regular mail is a connectionless service that is only a best effort. You do not need permission to send someone a letter (no connection establishment), and simply dropping a letter in the mailbox is no guarantee that it will be delivered successfully, but in most cases it will. All of this sounds exactly like what IP does, except that IP deals with packets and not packages.

IP addressing

IP addressing is a topic that is well covered in numerous other places. Technically, this being an Ethernet-based LAN switching book and all, it could be argued that anything above Layer 2 is beyond the scope of this book.

Yes, this could be said. But to do so would ignore the truth that virtually all new networks are IP-based, and each day some multiprotocol LAN gets closer to pure IP nirvana as one more of its legacy protocols is decommissioned in favor of IP transport. As such, a thorough understanding of IP addressing is critical for anyone dealing with LAN switching in the modern context. So, the packet stops here, to use yet another poor pun, and we pause to take a very condensed tour of what matters in IP addressing. It’s only 32 bits. How bad can it be?

Hierarchical

This is nothing but a fancy way of saying an IP address has more than one part. Here, we mean it has both a network and a host portion. This is significant, and to a large degree it is what allows a router to scale to a worldwide Internet while a bridge would melt under the “load” of all those MACs.

The takeaway: routers route to networks, not hosts. The host portion of IP is of concern only to the last hop router when it attempts direct delivery. In fact, because routing is based on a longest match, it’s likely that remote routers are even ignoring parts of the network portion because of supernetting, which is also called route summarization. In the end, the router only needs to direct the packet out a sane interface, one that gets it one step closer to its destination on a path that is not a loop. You do not need to examine 30 bits of network address to do this; trust me.

In fact, some non-core routers may use a default route, which is the ultimate in information hiding. The 0/0 route by definition says the router should try to match zero bits. Matching against nothing always succeeds, and is always the least specific match possible, but it’s a match nonetheless. Using a default means that in effect, the packet is routed not by virtue of matching its destination address particulars, but quite the opposite: by virtue of matching none. Hence a low-end router can still forward to each of the 4 billion possible IPv4 addresses with a single default route entry, along with its directly connected network. If the router has two egress interfaces, two default routes in the form of a 0/1 and a 128/1, each pointing to a different interface, of course, this provides pretty decent load balancing to all possible destination IPs. Not bad, huh? I’d like to see you do that with a bridge.

Classless is the norm (or, how we learned to subnet)

When first envisioned, IP addresses were class-based. Figure 1-8 shows the original IP address class breakdown along with a binary-to-hex-to-dotted decimal conversion example.

Figure 1-8 has a lot of information, and all of it is important. The left side shows the original IP address class breakdown. The address class is determined by the setting of the high-order bits. For example, a Class B address always begins with a 10 pattern, as shown. Behind this plan was the perception that computers were special-purpose machines and that the Internet would remain an academic/military community, so the early Internet architects saw a need for a few very large networks (Class A), a good number of medium-size networks (Class B), and a larger number of small networks (Class C). The figure shows that for each class, there is some number of supported networks, and each such network in turn supports some number of host computers. The box on the upper right shows the net effect in the form of how many networks are available in each class (126 for Class A), along with how many hosts each such network class supports. The math is a function of 2 to the power of the address space (7 and 24 for Class A networks and hosts, respectively), and then subtracting 2 for the combinations of all 0s and all 1s, which are generally reserved for indicating this and all, respectively.

Figure 1-8. “Classy” IP addressing

To be effective with IP addressing, you must understand binary, hexadecimal, and the more human-friendly dotted decimal format, as all are needed at various times when working with IP. Remember when working with hexadecimal that you break each byte into two nibbles of four bits each. The resulting values therefore range from 0 to 15, but in hex 10–15 are coded as the letters A–F, respectively. In contrast, when working on the decimal value, all eight bits of the byte are grouped to yield a value from 0 to 255.

The dashed box on the lower right of the figure provides an example of this conversion process. This busy little box also shows the IP network byte order, which has the most significant bit of the MSB sent first, from left to right. Stated again, bit 0 is the most significant of the 32 bits that make up the IP address, and it’s the first bit sent. The low-order octet also shows a power-of-10 breakdown for each of the eight bits in the octet. In this example all are set to 0, hence the value shown in Figure 1-8. A setting of 11000000 codes (128 + 64), or 192, as it has the bits for both 128 and 64 set. In hex this would be a C, given that 1100 0000 codes (8 + 4) = 12.

The class-based scheme was a fine plan, but as things worked out, classful addressing is far less than ideal. The updated IPv6 protocol has no such concept, and in all practicality, neither do modern IPv4 networks. The problem is basic inefficiency. It’s great that a single Class A network can support more than 16 million machines and all, but that many machines on one logical subnet is preposterous (from a performance and reliability design perspective). Heck, even a single Class C, with its support for 254 hosts, is generally wasted in a routed environment.

LAN-based routed networks tend to have tens, rather than hundreds, let alone thousands, of machines. In the end, what people wanted was more networks, each with fewer hosts. As noted, the issue is that a recipient of a single Class A network would be hard-pressed to go back to his regional numbering authority to ask for yet more network numbers, when it was shown that he was using only a small fraction of the host space available in the Class A allocation he already had.

Note

So, what does it mean that nearly the first command everyone enters on an IOS-based router is ip classless, which provides support for IP subnetting (and supernetting)? No such command is needed in JUNOS, so that’s one less command for you to type.

Classless IP routing simply means that for each IP address (prefix) there is an associated network mask. In contrast, with classful routing the address’s class is used to derive a presumed network mask. Having an explicit mask allows the user to define what portion of the 32-bit address identifies the network; once the network portion is known the remainder is considered to be host addresses.

Subnetting is the process of extending the mask, making it longer so as to extend network numbering into the host field. Thus, more networks are gained, at the cost of fewer hosts on each network. Supernetting is the opposite, and creates fewer networks by reducing network mask length. Supernetting is an important concept behind Classless Inter-Domain Routing (CIDR), which is an effort to summarize networks into fewer routing entries, wherever possible. This is done to try to keep the size of Internet routing tables from growing at a pace that outstrips computer processing power, a real threat that at one point genuinely jeopardized global Internet stability!

Figure 1-9 shows classless IP routing at work.

An often misunderstood aspect of CIDR is the fact that different network masks are used, at different places, to route the same packet! The network mask does not have to be the same length, except for the collection of hosts that attach to the same logical IP subnet/network. As a result, a core router may use a default (class-based) network mask to direct traffic to a customer’s network attachment point. In Figure 1-9, a Class B address is assigned, resulting in a /16 mask, which is also represented in the pre-CIDR notation format of 255.255.0.0. Within the customer’s network, the single Class B address has been subnetted, in this example to provide some 254 additional subnets through a /24 (or 255.255.255.0) network mask.

Figure 1-9. Subnetting and supernetting

Subnetting and supernetting are now old news in IP. Their widespread use is doing much to forestall the predicted demise of available IP addressing space, as well as the meltdown of core routers due to impractical routing table size, several times over.

VLSM and Discontiguous Subnets

Variable Length Subnet Masking (VLSM) refers to the ability to assign network masks of varying lengths to different portions of a network to maximize the proverbial bang for the IP addressing buck. For example, /30s, /31s, or even /32s might be used on P-to-P links, whereas /26 might be assigned to a LAN. There is no magic here, as IP routers always route to the most specific, or longest, match. The key is in having a routing protocol that supports the conveyance of a network mask along with the IP prefix. Older protocols such as Routing Information Protocol (RIP) v1, or Cisco’s IGRP, lack this capability, which forces you to use the same mask length each time a given (major/class-based) address is assigned. This is because the lack of network masks in routing updates forces the local router to assume that major/classful prefixes that match a local address assignment must use the same mask length as that assigned to the interface on which it was received. When VLSM is used, these assumptions are not valid and routing problems can surface.

Problems with discontiguous subnets are also related to routing protocols that do not convey a network mask. The issue is when some network address—say, a Class B 172.16—is subnetted on two routers that are separated by a link with a different major network (not a 172.16) address. In this case, protocols that do not support a mask perform auto-summarization to the classful network, resulting in both ends sending and receiving the same 172.16/16 update. The result is the loss of subnet routing for the discontiguous subnets. Using a routing protocol such as Open Shortest Path First (OSPF), IS-IS, or RIP v2, solves both issues through inclusion of an explicit network mask along with each network prefix.

ICMP, the bad news protocol

Moving up the stack in Figure 1-6 we next hit the Internet Control Message Protocol (ICMP). ICMP is classified as a sublayer. ICMP is an official part of IP, but is itself encapsulated inside IP and is therefore shown above it. ICMP is often used to report errors when handling IP datagrams, hence the not-so-funny title of this section. Common errors are Destination Unreachable, TTL Expired, Options Handling Issue, and Fragmentation Needed But Not Permitted. ICMP messages can also be used to provide information such as reporting a timestamp or the local link’s subnet mask. ICMP is the mechanism behind the echo request and response functionality affectionately referred to as ping.

UDP, multiplexing, and not much else

User Datagram Protocol (UDP) provides a best-effort connectionless Transport layer service. Recall that Layer 4 is the first end-to-end layer, and is therefore processed only by the destination machine. Given that IP is also a best-effort protocol, it can be said that UDP does not add much in the way of reliability. UDP offers no error correction or flow control; it does provide error detection (with silent discard) against the UDP header and payload.

UDP’s most important function is the notion of ports. The port abstract is similar to a Unix socket, and provides multiplexing among multiple processes that each share the same IP address. Recall that the IP layer’s Protocol field identified the owner of its payload, which may be UDP. IP then hands its payload to the UDP process, where the first step is error detection. If all is well, the UDP header is stripped and the destination port is used to direct the packet’s payload to the appropriate process. Port values below 1023 are standardized for use by well-known (server) processes. Clients pick their port at random, selecting some unused value in the ephemeral range of 1,024 to 65,535. In most cases, services that can use multiple transport protocols, that is, either TCP or UDP, use the same port values; the Protocol field at the IP layer ensures there is no ambiguity in such cases.

The connectionless nature of UDP makes it well suited to point-to-multipoint applications (multicast) and short-lived transactional services such as DNS queries.

TCP, a transport for all seasons

Transmission Control Protocol (TCP) provides reliable, connection-oriented services. TCP supports ports for the same reasons as UDP, but in addition, TCP has:

Connection setup, maintenance, and teardown phases that ensure that both ends agree regarding connection state. Traffic can be sent only when connected.
Flow control, which prevents data loss due to lack of a buffer in the connection endpoints (not at the IP layer).
Error detection based on a header/payload checksum, as well as through sequenced exchanges. This provides detection for corrupted data, in addition to lost or duplicated data, the latter being conditions that often occur given the datagram operation of the underlying IP.
Retransmission-based error correction based on sophisticated congestion avoidance and recovery mechanisms that attempt to optimize communications among endpoints with greatly dissimilar processing capabilities, and to intelligently monitor and adapt to current end-to-end transmission delays.

Given that everything from Layer 3 down is often switched in datagram fashion (connectionless), a method of operation that’s officially classified as unreliable, it’s obvious how important TCP is to the world. When data integrity matters, and when you need to move a lot of information, TCP is likely your protocol of choice. The connection-oriented nature of TCP means that in some cases, more traffic is sent to set up and tear down a connection than is actually sent over the connection, and that a given TCP connection can connect only two endpoints.

What’s this Internet thing for again, eh, sonny?

In the IP suite, applications are found directly above the Transport layer; there is no discrete Presentation or Application layer, but some IP applications provide these types of services. Some applications—for example, ICMP, or OSPF routing—make direct use of IP. Other routing options, such as RIP and Border Gateway Protocol (BGP), make use of UDP or TCP, respectively. More end-user-focused applications such as Telnet allow terminal emulation, or file transfer via FTP.

And then there is HTTP, the grand enabler of the modern Internet. Combined with the HTML specification, this is the only application that most people will ever use (to the extent that the Internet has become synonymous with the WWW, much to the irritation of this author). The Internet existed long before the WWW, and was quite useful to those with some level of .clue. The WWW allowed the great unwashed masses to rush in and make immediate productive and commercial use of the Internet. Although this killer app single-handedly ended the old geeks’ club that was the academic- and research-focused Internet, it also did a lot to boost router sales, which for me is reason enough to welcome HTTP into the IP suite.

From email to telemedicine, e-commerce to games, you can bet your last packet there’s an IP application written to support it.

IP encapsulation example

Figure 1-10 shows the TCP/IP stack at work, with an example of IP encapsulation within an Ethernet frame.

Figure 1-10. TCP/IP-over-Ethernet encapsulation example

This example begins with a TCP acknowledgment segment that needs to be sent. Although user data can be piggybacked onto such an ACK segment, this assumes that some user data is pending, and that’s obviously not always the case; the lack of user data does not exempt the TCP entity from having to ACK traffic received from the remote end. If we assume no TCP options (in many cases, options such as a maximum segment size or a timestamp are present), then a 20-byte TCP segment, the minimum size of its header, is passed to the IP layer. Along with the data are internal semantics (primitives in OSI-speak) that convey variables such as the destination IP address, and special ToS values, and so forth.

The IP layer accepts its duty and builds the needed header. Again, assuming no options, that’s another 20 bytes, for a total of 40 when the TCP header is also factored. Ethernet has maintained the need for a minimum frame size, which relates to ensuring reliable collision detection as a function of a frame’s minimum transmission time versus the maximum allowed propagation delay; basically, the station should still be sending by the time its signal has propagated to the far end and any resulting collision has had time to make it back. The result is a need for four bytes of padding, which is added by IP and accounted for via the Length field. Any data outside the datagram’s total length is assumed to be padding and is discarded by the far end. As is so often the case, the PDU rolls downhill only to darken Ethernet’s door. The service request also tells Ethernet to set the Type field to 0x0800, IP’s EtherType, and in our case, we can assume a successful ARP cache hit so that the next hop’s MAC address is also passed along.

Direct Versus Indirect Delivery

IP is all about routing. One of the most basic aspects of IP datagram forwarding is a routing decision in the form of whether the destination address is on the sending machine’s local subnet. If it is, direct delivery is performed and the ARP and subsequent packet are sent out over the interface with that direct route.

When the target subnet does not match a local subnet, indirect delivery is needed. This simply means that one or more intermediate stations will need to forward the packet on the local station’s behalf. Forwarding other people’s traffic is what routers are all about, so here the next hop would be to a device with at least two network connections, and with a willingness to forward traffic between those interfaces—in other words, a router. Usually an end station uses a default route to direct all non-local traffic to its default gateway (router). From there the packet typically picks up more intelligent forwarding that is based on least-cost routes that are dynamically learned via routing protocols that operate between the routers.

An important point about indirect delivery is that a packet that’s intended for a remote machine is sent to the MAC address of the local subnet’s default router. The router takes notice, strips the frame, and performs a longest-match lookup against the destination address, only to find that, alas, once again it’s not the intended recipient (routers get lonely, too). After stiffening its lip and decrementing the TTL, the same IP packet is then reframed and sent out of a different interface to the next forwarding hop, where the process repeats until either the packet arrives at the target host (in which case you have a /32 match, which is as long as it gets, baby), or the packet’s TTL expires and it’s ignominiously discarded, with nary but an ICMP (TTL expired) error message that suffices as its death knell.

Ethernet constructs its frame, populates the destination MAC and the Type field with the value provided by IP along with the service request, and goes about the dirty work of successfully placing the frame upon the wire. At the remote end, a reversal of this process occurs, ending with the remote TCP receiving its ACK and flushing its retransmit buffer of the related data, resting in what it knows is a job well done.

Internet Protocol Summary

IPv4: it made the Internet what it is today, and what it will be tomorrow. It’s the OSI that worked, and it’s here to stay, so we deal with it. Each time the protocol is predicted to have met its natural limit, due to a lack of addresses, a need for class of service (CoS), VPNs, encryptions, or something else, some bright engineers find a good workaround. For example, Network Address Translation/Port Address Translation (NAT/PAT) has done much to extend IP’s useful life by allowing use of a private network addressing space within a private network, which is then translated and effectively hidden behind a lesser number of real IP addresses. As another example, IP Security (IPSec) was originally planned to be inherent to IPv6 but was backported to its predecessor, providing one less compelling reason to change what is still working.

With all of that said, IPv6 is making headway into today’s networks. Many mobile devices are IPv6-enabled, and believe it or not, we are heading into a world where we will be surrounded by IP-addressable entities, be it your refrigerator, washing machine, or cable TV box. IPv6, with its 128-bit addressing space, combined with what we learned from IPv4 address allocation mistakes, promises that future generations will be free from having to worry about the Internet running out of addresses every few years. This is good, as it seems they will have plenty more to worry about, but that is another story and one best not told here.

IP is the convergence technology of choice in today’s networks. It rides over every type of transport, and if it can be digitized, it likely rides inside IP. IP over everything and everything over IP. Learn it. Live it. Love it.

^[2]In most cases, a Domain Name System (DNS) name is specified, but the result is an IP address, so we can skip the IP-to-domain name binding complexity for now.

Get JUNOS Enterprise Switching now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

JUNOS Enterprise Switching by Harry Reynolds, Doug Marschke

The TCP/IP Suite

Enter OSI

Exit OSI, Enter IP

The IP Stack, in a Nutshell

The network that lies beneath

ARP me, Amadeus

IP, freely

Note

IP addressing

Hierarchical

Classless is the norm (or, how we learned to subnet)

Note

ICMP, the bad news protocol

UDP, multiplexing, and not much else

TCP, a transport for all seasons

What’s this Internet thing for again, eh, sonny?

IP encapsulation example

Internet Protocol Summary

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly