Chapter 1. Background
The White Rabbit put on his spectacles. “Where shall I begin, please your Majesty?” he asked.
“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop.”
It’s important to know a little ARPAnet history to understand the Domain Name System (DNS). DNS was developed to address particular problems on the ARPAnet, and the Internet—a descendant of the ARPAnet—is still its main user.
If you’ve been using the Internet for years, you can probably skip this chapter. If you haven’t, we hope it’ll give you enough background to understand what motivated the development of DNS.
A (Very) Brief History of the Internet
In the late 1960s, the U.S. Department of Defense’s Advanced Research Projects Agency, ARPA (later DARPA), began funding the ARPAnet, an experimental wide area computer network that connected important research organizations in the United States. The original goal of the ARPAnet was to allow government contractors to share expensive or scarce computing resources. From the beginning, however, users of the ARPAnet also used the network for collaboration. This collaboration ranged from sharing files and software and exchanging electronic mail—now commonplace—to joint development and research using shared remote computers.
The Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite was developed in the early 1980s and quickly became the standard host-networking protocol on the ARPAnet. The inclusion of the protocol suite in the University of California at Berkeley’s popular BSD Unix operating system was instrumental in democratizing internetworking. BSD Unix was virtually free to universities. This meant that internetworking—and ARPAnet connectivity—were suddenly available cheaply to many more organizations than were previously attached to the ARPAnet. Many of the computers being connected to the ARPAnet were being connected to local area networks (LANs), too, and very shortly the other computers on the LANs were communicating via the ARPAnet as well.
The network grew from a handful of hosts to tens of thousands of hosts. The original ARPAnet became the backbone of a confederation of local and regional networks based on TCP/IP, called the Internet.
In 1988, however, DARPA decided the experiment was over. The Department of Defense began dismantling the ARPAnet. Another network, the NSFNET, funded by the National Science Foundation, replaced the ARPAnet as the backbone of the Internet.
In the spring of 1995, the Internet made a transition from using the publicly funded NSFNET as a backbone to using multiple commercial backbones, run by telecommunications companies such as SBC and Sprint, and long-time commercial internetworking players such as MFS and UUNET.
Today, the Internet connects millions of hosts around the world. In fact, a significant proportion of the non-PC computers in the world are connected to the Internet. Some commercial backbones carry a volume of several gigabits per second, tens of thousands of times the bandwidth of the original ARPAnet. Tens of millions of people use the network for communication and collaboration daily.
On the Internet and Internets
A word on “the Internet,” and on “internets” in general, is in order. In print, the difference between the two seems slight: one is always capitalized, one isn’t. The distinction between their meanings, however, is significant. The Internet, with a capital “I,” refers to the network that began its life as the ARPAnet and continues today as, roughly, the confederation of all TCP/IP networks directly or indirectly connected to commercial U.S. backbones. Seen up close, it’s actually quite a few different networks—commercial TCP/IP backbones, corporate and U.S. government TCP/IP networks, and TCP/IP networks in other countries—interconnected by high-speed digital circuits. A lowercase internet, on the other hand, is simply any network made up of multiple smaller networks using the same internetworking protocols. An internet (little “i”) isn’t necessarily connected to the Internet (big “I”), nor does it necessarily use TCP/IP as its internetworking protocol. There are isolated corporate internets, for example.
An intranet, with a little i, is really just a TCP/IP-based internet, used to emphasize the use of technologies developed and introduced on the Internet on a company’s internal corporate network. An extranet, on the other hand, is a TCP/IP-based internet that connects partner companies, or a company to its distributors, suppliers, and customers.
The History of the Domain Name System
Through the 1970s, the ARPAnet was a small, friendly community of a few hundred hosts. A single file, HOSTS.TXT, contained a name-to-address mapping for every host connected to the ARPAnet. The familiar Unix host table, /etc/hosts, was compiled from HOSTS.TXT (mostly by deleting fields Unix didn’t use).
HOSTS.TXT was maintained by SRI’s Network Information Center (dubbed “the NIC”) and distributed from a single host, SRI-NIC.[*] ARPAnet administrators typically emailed their changes to the NIC, and periodically FTP’ed to SRI-NIC and grabbed the current HOSTS.TXT file. Their changes were compiled into a new HOSTS.TXT file once or twice a week. As the ARPAnet grew, however, this scheme became unworkable. The size of HOSTS.TXT grew in proportion to the growth in the number of ARPAnet hosts. Moreover, the traffic generated by the update process increased even faster: every additional host meant not only another line in HOSTS.TXT, but potentially another host updating from SRI-NIC.
When the ARPAnet moved to TCP/IP, the population of the network exploded. Now there was a host of problems with HOSTS.TXT (no pun intended):
- Traffic and load
The toll on SRI-NIC, in terms of the network traffic and processor load involved in distributing the file, was becoming unbearable.
- Name collisions
No two hosts in HOSTS.TXT could have the same name. However, while the NIC could assign addresses in a way that guaranteed uniqueness, it had no authority over hostnames. There was nothing to prevent someone from adding a host with a conflicting name and breaking the whole scheme. Adding a host with the same name as a major mail hub, for example, could disrupt mail service to much of the ARPAnet.
Maintaining consistency of the file across an expanding network became harder and harder. By the time a new HOSTS.TXT file could reach the farthest shores of the enlarged ARPAnet, a host across the network may have changed addresses or a new host may have sprung up.
The essential problem was that the HOSTS.TXT mechanism didn’t scale well. Ironically, the success of the ARPAnet as an experiment led to the failure and obsolescence of HOSTS.TXT.
The ARPAnet’s governing bodies chartered an investigation to develop a successor for HOSTS.TXT. Their goal was to create a system that solved the problems inherent in a unified host-table system. The new system should allow local administration of data yet make that data globally available. The decentralization of administration would eliminate the single-host bottleneck and relieve the traffic problem. And local management would make the task of keeping data up-to-date much easier. The new system should use a hierarchical namespace to name hosts. This would ensure the uniqueness of names.
Paul Mockapetris, then of USC’s Information Sciences Institute, was responsible for designing the architecture of the new system. In 1984, he released RFCs 882 and 883, which described the Domain Name System. These RFCs were superseded by RFCs 1034 and 1035, the current specifications of the Domain Name System.[*] RFCs 1034 and 1035 have since been augmented by many other RFCs, which describe potential DNS security problems, implementation problems, administrative gotchas, mechanisms for dynamically updating nameservers and for securing zone data, and more.
The Domain Name System, in a Nutshell
The Domain Name System is a distributed database. This structure allows local control of the segments of the overall database, yet data in each segment is available across the entire network through a client/server scheme. Robustness and adequate performance are achieved through replication and caching.
Programs called nameservers constitute the server half of DNS’s client/server mechanism. Nameservers contain information about some segments of the database and make that information available to clients, called resolvers . Resolvers are often just library routines that create queries and send them across a network to a nameserver.
The structure of the DNS database, shown in Figure 1-1, is similar to the structure of the Unix filesystem. The whole database (or filesystem) is pictured as an inverted tree, with the root node at the top. Each node in the tree has a text label, which identifies the node relative to its parent. This is roughly analogous to a “relative pathname” in a filesystem, like bin. One label—the null label, or " “—is reserved for the root node. In text, the root node is written as a single dot (.). In the Unix filesystem, the root is written as a slash (/).
Each node is also the root of a new subtree of the overall tree. Each of these subtrees represents a partition of the overall database—a directory in the Unix filesystem, or a domain in the Domain Name System. Each domain or directory can be further divided into additional partitions, called subdomains in DNS, like a filesystem’s subdirectories. Subdomains, like subdirectories, are drawn as children of their parent domains.
Every domain has a unique name, like every directory. A domain’s domain name identifies its position in the database, much as a directory’s absolute pathname specifies its place in the filesystem. In DNS, the domain name is the sequence of labels from the node at the root of the domain to the root of the whole tree, with dots (.) separating the labels. In the Unix filesystem, a directory’s absolute pathname is the list of relative names read from root to leaf (the opposite direction from DNS, as shown in Figure 1-2), using a slash to separate the names.
In DNS, each domain can be broken into a number of subdomains, and responsibility for those subdomains can be doled out to different organizations. For example, an organization called EDUCAUSE manages the edu (educational) domain but delegates responsibility for the berkeley.edu subdomain to U.C. Berkeley (Figure 1-3). This is similar to remotely mounting a filesystem: certain directories in a filesystem may actually be filesystems on other hosts, mounted from remote hosts. The administrator on host winken, for example (again, Figure 1-3), is responsible for the filesystem that appears on the local host as the directory /usr/nfs/winken.
Delegating authority for berkeley.edu to U.C. Berkeley creates a new zone, an autonomously administered piece of the namespace. The zone berkeley.edu is now independent from edu and contains all domain names that end in berkeley.edu. The zone edu, on the other hand, contains only domain names that end in edu but aren’t in delegated zones such as berkeley.edu”. berkeley.edu may be further divided into subdomains, such as cs.berkeley.edu, and some of these subdomains may themselves be separate zones, if the berkeley.edu administrators delegate responsibility for them to other organizations. If cs.berkeley.edu is a separate zone, the berkeley.edu zone doesn’t contain domain names that end in cs.berkeley.edu (Figure 1-4).
Domain names are used as indexes into the DNS database. You might think of data in DNS as “attached” to a domain name. In a filesystem, directories contain files and subdirectories. Likewise, domains can contain both hosts and subdomains. A domain contains those hosts and subdomains whose domain names are within the domain’s subtree of the namespace.
Each host on a network has a domain name, which points to information about the host (see Figure 1-5). This information may include IP addresses, information about mail routing, etc. Hosts may also have one or more domain name aliases, which are simply pointers from one domain name (the alias) to another (the official, or canonical, domain name). In Figure 1-5, mailhub.nv . . . is an alias for the canonical name rincon.ba.ca . . . .
Why all the complicated structure? To solve the problems that HOSTS.TXT had. For example, making domain names hierarchical eliminates the pitfall of name collisions. Each domain has a unique domain name, so the organization that runs the domain is free to name hosts and subdomains within its domain. Whatever name they choose for a host or subdomain won’t conflict with other organizations’ domain names because it will end in their unique domain name. For example, the organization that runs hic.com can name a host puella (as shown in Figure 1-6) because it knows that the host’s domain name will end in hic.com, a unique domain name.
The History of BIND
The first implementation of the Domain Name System was called JEEVES, written by Paul Mockapetris himself. A later implementation was BIND, an acronym for Berkeley Internet Name Domain, written by Kevin Dunlap for Berkeley’s 4.3 BSD Unix. BIND is now maintained by the Internet Systems Consortium. [*]
BIND is the implementation we’ll concentrate on in this book and is by far the most popular implementation of DNS today. It has been ported to most flavors of Unix and is shipped as a standard part of most vendors’ Unix offerings. BIND has even been ported to Microsoft’s Windows NT, Windows 2000, and Windows Server 2003.
Must I Use DNS?
Despite the usefulness of the Domain Name System, there are some situations in which it doesn’t pay to use it. There are other name-resolution mechanisms besides DNS, some of which may be a standard part of your operating system. Sometimes the overhead involved in managing zones and their nameservers outweighs the benefits. On the other hand, there are circumstances in which you have no other choice but to set up and manage nameservers. Here are some guidelines to help you make that decision:
- If you’re connected to the Internet . . .
. . . DNS is a must. Think of DNS as the lingua franca of the Internet: nearly all of the Internet’s network services use DNS. That includes the Web, electronic mail, remote terminal access, and file transfer.
On the other hand, this doesn’t necessarily mean that you have to set up and run zones by yourself for yourself. If you’ve only got a handful of hosts, you may be able to join an existing zone (see Chapter 3) or find someone else to host your zones for you. If you pay an Internet service provider for your Internet connectivity, ask if it’ll host your zone for you, too. Even if you aren’t already a customer, there are companies that will help out, for a price.
If you have a little more than a handful of hosts, or a lot more, you’ll probably want your own zone. And if you want direct control over your zone and your nameservers, you’ll want to manage it yourself. Read on!
- If you have your own TCP/IP-based internet . . .
. . . you probably want DNS. By an internet, we don’t mean just a single Ethernet of workstations using TCP/IP (see the next section if you thought that was what we meant); we mean a fairly complex “network of networks.” Maybe you have several dozen Ethernet segments connected via routers, for example.
If your internet is basically homogeneous and your hosts don’t need DNS (say they don’t run TCP/IP at all), you may be able to do without it. But if you’ve got a variety of hosts, especially if some of those run some variety of Unix, you’ll want DNS. It’ll simplify the distribution of host information and rid you of any kludgy host-table distribution schemes you may have cooked up.
- If you have your own local area network or site network . . .
. . . and that network isn’t connected to a larger network, you can probably get away without using DNS. You might consider using Microsoft’s Windows Internet Name Service (WINS), host tables, or Sun’s Network Information Service (NIS) product.
But if you need distributed administration or have trouble maintaining the consistency of data on your network, DNS may be for you. And if your network is likely to soon be connected to another network, such as your corporate internet or the Internet, it’d be wise to set up your zones now.
[*] SRI is the former Stanford Research Institute in Menlo Park, California. SRI conducts research into many different areas, including computer networking.
[*] RFCs are Request for Comments documents, part of the relatively informal procedure for introducing new technology on the Internet. RFCs are usually freely distributed and contain fairly technical descriptions of the technology, often intended for implementors.