O'Reilly logo

HTML & XHTML: The Definitive Guide, 6th Edition by Bill Kennedy, Chuck Musciano

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Talking the Internet Talk

Every computer connected to the Internet (even a beat-up old Apple II) has a unique address: a number whose format is defined by the Internet Protocol (IP), the standard that defines how messages are passed from one machine to another on the Net. An IP address is made up of four numbers, each less than 256, joined together by periods, such as and

While computers deal only with numbers, people prefer names. For this reason, most computers also have names bestowed upon them. By current estimates, there are hundreds of millions, if not billions, of devices on the Net, so it would be very difficult to come up with that many unique names, let alone keep track of them all. Instead, the Internet is a network of networks, and is divided into groups known as domains, which are further divided into one or more subdomains. So, while you might choose a very common name for your computer, it becomes unique when you append, like surnames, all of the machine’s domain names as a period-separated suffix, creating a fully qualified domain name.

This naming stuff is easier than it sounds. For example, the fully qualified domain name www.oreilly.com translates to a machine named “www” that’s part of the domain known as “oreilly,” which, in turn, is part of the commercial (com) branch of the Internet. Other branches of the Internet include educational institutions (edu), nonprofit organizations (org), the U.S. government (gov), and Internet service providers (net). Computers and networks outside the United States may have two-letter abbreviations at the end of their names: for example, “ca” for Canada, “jp” for Japan, and “uk” for the United Kingdom.

Special computers, known as nameservers, keep tables of machine names and associated IP addresses and translate one into the other for us and for our machines. Domain names must be registered and paid for through any one of the now many for-profit registrars.[*] Once a unique domain name is registered, its owner makes it and its address available to other domain nameservers around the world.

Clients, Servers, and Browsers

The Internet connects two kinds of computers: servers, which serve up documents, and clients, which retrieve and display documents for us humans. Things that happen on the server machine are said to be on the server side, and activities on the client machine occur on the client side.

To access and display HTML documents, we run programs called browsers on our client computers. These browser clients talk to special web servers over the Internet to access, retrieve, and display electronic documents.

A variety of browsers are available today. Internet Explorer comes with Microsoft’s operating system software, for example, while most other browsers are free for download on the Web. And most browsers run on client devices that have high-resolution, high-color graphical viewing screens. In fact, today’s browsers share common HTML-rendering software under the hood, so to speak, and differ only by extraneous, albeit some very useful features. For instance, when you install Netscape Navigator version 8, you decide whether to use the NCSA Mosaic rendering software, portions of which also are under Microsoft’s Internet Explorer, or Mozilla’s software, which comes under the hood of another popular browser, Firefox.

This is very different from around the turn of the century, when Internet Explorer savagely competed with Netscape Navigator through unique extensions to the HTML language. Internet Explorer won. Many of its extensions even became HTML standards, and others such as Netscape’s layout extensions disappeared and so got relegated to appendices in this book.

The Flow of Information

All web activity begins on the client side, when a user starts his browser. The browser begins by loading a home page document, either from local storage or from a server over some network, such as the Internet, a corporate intranet, or a town extranet. When starting up on the network, the client browser first consults a domain name system (DNS) server to translate the home page server’s name, such as www.oreilly.com, into an IP address, before sending a request to that server over the Internet. This request (and the server’s reply) is formatted according to the dictates of the Hypertext Transfer Protocol (HTTP) standard.

A server spends most of its time listening to the network, waiting for document requests with the server’s unique address stamped on them. Upon receipt of a request, the server verifies that the requesting browser is allowed to retrieve documents from the server and, if so, checks for the requested document. If it finds the document, the server sends it to the browser. The server usually logs the request, typically including the client computer’s IP address, the document requested, and the time. The server might also issue special attachments known as cookies that contain additional information about the requesting browser and its owner.

Back on the browser, the document arrives. If it’s a plain-vanilla text file, most browsers display it in a common, plain-vanilla way. Document directories, too, are treated like plain documents, which most graphical browsers display as folder icons that the user may select, thereby requesting to view the contents of the subdirectory.

Browsers can retrieve many different types of files from a server. Unless assisted by a helper program or specially enabled by plug-in software or applets, which display an image or video file or play an audio file, the browser usually stores the file directly on a local disk for later use.

For the most part, however, the browser retrieves a special document that appears to be a plain text file but that contains both text and special markup codes called tags. The browser processes these HTML or XHTML documents, formatting the text based on the tags and downloading special accessory files, such as images.

The user reads the document, selects a hyperlink to another document, and the entire process starts over.

Beneath the Web

We should point out again that browsers and HTTP servers need not be part of the Web to function. In fact, you never need to be connected to the Internet or to any network, for that matter, to write HTML/XHTML documents and operate a browser. You can load and display locally stored documents and accessory files directly on your browser. Many organizations take advantage of this capability by distributing catalogs and product manuals, for instance, on a much less expensive, but much more interactively useful, CD-ROM, rather than via traditional print on paper. Many graphical-user applications even document their features through HTML/XHTML-based Help menus.

Isolating web documents is good for the author, too, since it gives you the opportunity to finish, in the editorial sense of the word, a document collection for later distribution. Diligent authors work locally to write and proof their documents before releasing them for general distribution, thereby sparing readers the agonies of broken image files and bogus hyperlinks.[*]

Organizations, too, can be connected to the Internet but also maintain private web sites and document collections for distribution to clients on their local networks, or intranets. In fact, private web sites are fast becoming the technology of choice for the paperless offices we’ve heard so much about during these last few years. With HTML and XHTML document collections, businesses can maintain personnel databases complete with employee photographs and online handbooks, collections of blueprints, parts, assembly manuals, and so on—all readily and easily accessed electronically by authorized users and displayed on a local computer.

Standards Organizations

Like many popular technologies, HTML started out as an informal specification used by only a few people. As more and more authors began to use the language, it became obvious that more formal means were needed to define and manage—i.e., to standardize—the language’s features, making it easier for everyone to create and share documents.

The World Wide Web Consortium

The World Wide Web Consortium (W3C) was formed with the charter to define the standards for HTML and, later, XHTML. Members are responsible for drafting, circulating for review, and modifying the standard based on cross-Internet feedback to best meet the needs of many.

Beyond HTML and XHTML, the W3C has the broader responsibility of standardizing any technology related to the Web; they manage the HTTP, Cascading Style Sheet (CSS), and Extensible Markup Language (XML) standards, as well as related standards for document addressing on the Web. They also solicit draft standards for extensions to existing web technologies.

If you want to track HTML, XML, XHTML, CSS, and other exciting web development and related technologies, contact the W3C at http://www.w3.org.

Also, several Internet newsgroups are devoted to the Web, each a part of the comp.infosystems.www hierarchy. These include comp.infosystems.www.authoring.html and comp.infosystems.www.authoring.images.

The Internet Engineering Task Force

Even broader in reach than W3C, the Internet Engineering Task Force (IETF) is responsible for defining and managing every aspect of Internet technology. The Web is just one small area under the purview of the IETF.

The IETF defines all of the technology of the Internet via official documents known as Requests for Comments, or RFCs. Individually numbered for easy reference, each RFC addresses a specific Internet technology—everything from the syntax of domain names and the allocation of IP addresses to the format of electronic mail messages.

To learn more about the IETF and follow the progress of various RFCs as they are circulated for review and revision, visit the IETF home page, http://www.ietf.org.

[*] At one time, a single nonprofit organization known as InterNIC handled that function. Now ICANN.org coordinates U.S. government-related nameservers, but other organizations or individuals must work through a for-profit company to register their unique domain names.

[*] Vigorous testing of HTML documents once they are made available on the Web is, of course, also highly recommended and necessary to rid them of various linking bugs.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required