BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


HTML & XHTML: The Definitive Guide
HTML & XHTML: The Definitive Guide, Fifth Edition

By Chuck Musciano, Bill Kennedy

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: HTML, XHTML, and the World Wide Web
Though it began as a military experiment and spent its adolescence as a sandbox for academics and eccentrics, in less than a decade the worldwide network of computer networks -- also known as the Internet — has matured into a highly diversified, financially important community of computer users and information vendors. From the boardroom to your living room, you can bump into Internet users of nearly any and all nationalities, of any and all persuasions, from serious to frivolous individuals, from businesses to nonprofit organizations, and from born-again Christian evangelists to pornographers.
In many ways, the Web — the open community of hypertext-enabled document servers and readers on the Internet — is responsible for the meteoric rise in the network's popularity. You, too, can become a valued member by contributing: writing HTML and XHTML documents and then making them available to web surfers worldwide.
Let's climb up the Internet family tree to gain some deeper insight into its magnificence, not only as an exercise of curiosity, but to help us better understand just who and what it is we are dealing with when we go online.
Although popular media accounts are often confused and confusing, the concept of the Internet really is rather simple: it's a worldwide collection of computer networks — a network of networks — sharing digital information via a common set of networking and software protocols.
Networks are not new to computers. What makes the Internet unique is its worldwide collection of digital telecommunication links that share a common set of computer-network technologies, protocols, and applications. Whether you run Microsoft Windows XP, Linux, Mac OS X, or even the now ancient Windows 3.1, when connected to the Internet, computers all speak the same networking language and use functionally identical programs, so you can exchange information — even multimedia pictures and sound — with someone next door or across the planet.
The common and now quite familiar programs people use to communicate and distribute their work over the Internet have also found their way into private and semi-private networks. These so-called
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Internet
Although popular media accounts are often confused and confusing, the concept of the Internet really is rather simple: it's a worldwide collection of computer networks — a network of networks — sharing digital information via a common set of networking and software protocols.
Networks are not new to computers. What makes the Internet unique is its worldwide collection of digital telecommunication links that share a common set of computer-network technologies, protocols, and applications. Whether you run Microsoft Windows XP, Linux, Mac OS X, or even the now ancient Windows 3.1, when connected to the Internet, computers all speak the same networking language and use functionally identical programs, so you can exchange information — even multimedia pictures and sound — with someone next door or across the planet.
The common and now quite familiar programs people use to communicate and distribute their work over the Internet have also found their way into private and semi-private networks. These so-called intranets and extranets use the same software, applications, and networking protocols as the Internet. But unlike the Internet, intranets are private networks, with access restricted to members of the institution. Likewise, extranets restrict access but use the Internet to provide services to members.
The Internet, on the other hand, seemingly has no restrictions. Anyone with a computer and the right networking software and connection can "get on the Net" and begin exchanging words, sounds, and pictures with others around the world, day or night: no membership required. And that's precisely what is confusing about the Internet.
Like an oriental bazaar, the Internet is not well organized, there are few content guides, and it can take a lot of time and technical expertise to tap its full potential. That's because . . .
The Internet began in the late 1960s as an experiment in the design of robust computer networks. The goal was to construct a network of computers that could withstand the loss of several machines without compromising the ability of the remaining ones to communicate. Funding came from the U.S. Department of Defense, which had a vested interest in building information networks that could withstand nuclear attack.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Talking the Internet Talk
Every computer connected to the Internet (even a beat-up old Apple II) has a unique address: a number whose format is defined by the Internet protocol (IP), the standard that defines how messages are passed from one machine to another on the Net. An IP address is made up of four numbers, each less than 256, joined together by periods, such as 192.12.248.73 or 131.58.97.254.
While computers deal only with numbers, people prefer names. For this reason, each computer on the Internet also has a name bestowed upon it by its owner. There are several million machines on the Net, so it would be very difficult to come up with that many unique names, let alone keep track of them all. Recall, though, that the Internet is a network of networks. It is divided into groups known as domains , which are further divided into one or more subdomains. So, while you might choose a very common name for your computer, it becomes unique when you append, like surnames, all of the machine's domain names as a period-separated suffix, creating a fully qualified domain name.
This naming stuff is easier than it sounds. For example, the fully qualified domain name www.oreilly.com translates to a machine named "www" that's part of the domain known as "oreilly," which, in turn, is part of the commercial (com) branch of the Internet. Other branches of the Internet include educational institutions (edu), nonprofit organizations (org), the U.S. government (gov), and Internet service providers (net). Computers and networks outside the United States may have two-letter abbreviations at the end of their names: for example, "ca" for Canada, "jp" for Japan, and "uk" for the United Kingdom.
Special computers, known as name servers, keep tables of machine names and their associated unique numerical IP addresses and translate one into the other for us and for our machines. Domain names must be registered and paid for through any one of the now many for-profit registrars. Once it is registered, the owner of the unique domain name broadcasts it and its address to other domain name servers around the world. Each domain and subdomain has an associated name server, so ultimately every machine is known uniquely by both a name and an IP address.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML and XHTML: What They Are
HTML and XHTML are document-layout and hyperlink-specification languages. They define the syntax and placement of special, embedded directions that aren't displayed by the browser but tell it how to display the contents of the document, including text, images, and other support media. The languages also tell you how to make a document interactive through special hypertext links, which connect your document with other documents — on either your computer or someone else's — as well as with other Internet resources.
You've certainly heard of HTML, and perhaps XHTML too, but did you know that they are just two of many other markup languages? Indeed, HTML is the black sheep in the family of document markup languages. HTML was based on SGML, the Standard Generalized Markup Language. The powers-that-be created SGML with the intent that it be the one and only markup metalanguage from which all other document markup elements would be created. Everything from hieroglyphics to HTML can be defined using SGML, negating any need for any other markup language.
The problem with SGML is that it is so broad and all-encompassing that mere mortals cannot use it. Using SGML effectively requires very expensive and complex tools that are completely beyond the scope of regular people who just want to bang out an HTML document in their spare time. As a result, HTML adheres to some, but not all, SGML standards, eliminating many of the more esoteric features so that it is readily useable and used.
Besides the fact that SGML is unwieldy and not well suited to describing the very popular HTML in a useful way, there was also a growing need to define other HTML-like markup languages to handle different network documents. Accordingly, the W3C defined the Extensible Markup Language (XML). Like SGML, XML is a separate formal markup metalanguage that uses select features of SGML to define markup languages. It eliminates many features of SGML that aren't applicable to languages like HTML and simplifies other SGML elements in order to make them easier to use and understand.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML and XHTML: What They Aren't
Despite all their new, multimedia-enabling page-layout features, and the hot technologies that give life to HTML/XHTML documents over the Internet, it is also important to understand the languages' limitations. They are not word-processing tools, desktop-publishing solutions, or even programming languages. Their fundamental purpose is to define the structure and appearance of documents and document families so that they may be delivered quickly and easily to a user over a network for rendering on a variety of display devices. Jack of all trades, but master of none, so to speak.
HTML and its progeny, XHTML, provide many different ways to let you define the appearance of your documents: font specifications, line breaks, and multicolumn text are all features of the language. Of course, appearance is important, since it can have either detrimental or beneficial effects on how users access and use the information in your documents.
Nonetheless, we believe that content is paramount; appearance is secondary, particularly since it is less predictable, given the variety of browser graphics and text-formatting capabilities. In fact, HTML and XHTML contain many ways for structuring your document content without regard to the final appearance: section headers, structured lists, paragraphs, rules, titles, and embedded images are all defined by the standard languages without regard for how these elements might be rendered by a browser. Consider, for example, a browser for the blind, wherein graphics on the page come with audio descriptions and alternative rules for navigation. The HTML/XHTML standards define such a thing: content over visual presentation.
If you treat HTML or XHTML as a document-generation tool, you will be sorely disappointed in your ability to format your document in a specific way. There is simply not enough capability built into the languages to allow you to create the kinds of documents you might whip up with tools like FrameMaker or Microsoft Word. Attempts to subvert the supplied structuring elements to achieve specific formatting tricks seldom work across all browsers. In short, don't waste your time trying to force HTML and XHTML to do things they were never designed to do.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Standards and Extensions
The basic syntax and semantics of HTML are defined in the HTML standard, now in its final version, 4.01. HTML matured quickly, in barely a decade. At one time, a new version would appear before you had a chance to finish reading an earlier edition of this book. Today, HTML has stopped evolving. As far as the W3C is concerned, XHTML has taken over. Now the wait is for browser manufacturers to implement the standards.
The XHTML standard currently is Version 1.0. Fortunately, XHTML Version 1.0 is, for the most part, a reconstitution of HTML Version 4.0.1. There are some differences, which we explore in Chapter 16. The popular browsers continue to support HTML documents, so there is no cause to stampede to XHTML. Do, however, start walking in that direction: a newer XHTML version, 1.1, is under consideration at the W3C, and browser developers are slowly but surely dropping nonstandard HTML features from their products.
Obviously, browser developers rely upon standards to have their software properly format and display common HTML and XHTML documents. Authors use the standards to make sure they are writing effective, correct documents that get displayed properly by the browsers.
However, standards are not always explicit; manufacturers have some leeway in how their browsers might display an element. And to complicate matters, commercial forces have pushed developers to add into their browsers nonstandard extensions meant to improve the language.
Confused? Don't be: in this book, we explore in detail the syntax, semantics, and idioms of the HTML Version 4.01 and XHTML Version 1.0 languages, along with the many important extensions that are supported in the latest versions of the most popular browsers.
It doesn't take an advanced degree in The Obvious to know that distinction draws attention. So, too, with browsers. Extra whizbang features can give the edge in the otherwise standardized browser market. That can be a nightmare for authors. A lot of people want you to use the latest and greatest gimmick or even useful HTML/XHTML extension. But it's not part of the standard, and not all browsers support it. In fact, on occasion, the popular browsers support different ways of doing the same thing.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tools for the Web Designer
While you can use the barest of barebones text editors to create HTML and XHTML documents, most authors have a bit more elaborate toolbox of software utilities than a simple word processor. You also need a browser, so you can test and refine your work. Beyond the essentials are some specialized software tools for developing and preparing HTML documents and accessory multimedia files.
At the very least, you'll need an editor, a browser to check your work, and, ideally, a connection to the Internet.

Section 1.6.1.1: Word processor or WYSIWYG editor?

Some authors use the word-processing capabilities of their specialized HTML/XHTML editing software. Some use the WYSIWYG (what-you-see-is-what-you-get) composition tools that come with their browsers or the latest versions of the popular word processors. Others, such as ourselves, prefer to compose their work on a general word processor and later insert the markup tags and their attributes. Still others include markup as they compose.
We think the stepwise approach — compose, then mark up — is the better way. We find that once we've defined and written the document's content, it's much easier to make a second pass to judiciously and effectively add the HTML/XHTML tags to format the text. Otherwise, the markup can obscure the content. Note, too, that unless specially trained (if they can be), spell-checkers and thesauruses typically choke on markup tags and their various parameters. You can spend what seems to be a lifetime clicking the Ignore button on all those otherwise valid markup tags when syntax- or spell-checking a document.
When and how you embed markup tags into your document dictates the tools you need. We recommend that you use a good word processor, which comes with more and better writing tools than simple text editors or the browser-based markup-language editors. You'll find, for instance, that an outliner, spell-checker, and thesaurus will best help you craft the document's flow and content, disregarding for the moment its look. The latest word processors encode your documents with HTML, too, but don't expect miracles. Except for boilerplate documents, you will probably need to nurse those automated HTML documents to full health. (Not to mention put them on a diet when you see how long the generated HTML is.) And it'll be a while before you'll see XHTML-specific markup tools in the popular word processors.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Quick Start
We didn't spend hours studiously poring over some reference book before we wrote our first HTML document. You probably shouldn't, either. HTML is simple to read and understand, and it's simple to write. And once you've written an HTML document, you've nearly completed your first XHTML one, too. So let's get started without first learning a lot of arcane rules.
To help you get that quick, satisfying start, we've included this chapter as a brief summary of the many elements of HTML and its progeny, XHTML. Of course, we've left out a lot of details and some tricks that you should know. Read the upcoming chapters to get the essentials for becoming fluent in HTML and XHTML.
Even if you are familiar with the languages, we recommend that you work your way through this chapter before tackling the rest of the book. It not only gives you a working grasp of basic HTML/XHTML and their jargon, but you'll also be more productive later, flush with the confidence that comes from creating attractive documents in such a short time.
Use any text editor to create an HTML or XHTML document, as long as it can save your work on disk in ASCII text file format. That's because even though documents include elaborate text layout and pictures, they're all just plain old ASCII text documents themselves. A fancier WYSIWYG editor or a translator for your favorite word processor are fine, too — although they may not support the many nonstandard features we discuss later in this book. You'll probably end up touching up the source text they produce, in any case.
While it's not needed to compose documents, you should have at least one version of a popular browser installed on your computer to view your work, preferably Netscape Navigator or Microsoft Internet Explorer. That's because, unless you use a special editor, the source document you compose won't look anything like what gets displayed by a browser, even though it's the same document. Make sure what your readers actually see is what you intended by viewing the document yourself with a browser. Besides, the popular ones are free over the Internet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing Tools
Use any text editor to create an HTML or XHTML document, as long as it can save your work on disk in ASCII text file format. That's because even though documents include elaborate text layout and pictures, they're all just plain old ASCII text documents themselves. A fancier WYSIWYG editor or a translator for your favorite word processor are fine, too — although they may not support the many nonstandard features we discuss later in this book. You'll probably end up touching up the source text they produce, in any case.
While it's not needed to compose documents, you should have at least one version of a popular browser installed on your computer to view your work, preferably Netscape Navigator or Microsoft Internet Explorer. That's because, unless you use a special editor, the source document you compose won't look anything like what gets displayed by a browser, even though it's the same document. Make sure what your readers actually see is what you intended by viewing the document yourself with a browser. Besides, the popular ones are free over the Internet.
Also note that you don't need a connection to the Internet or the Web to write and view your HTML or XHTML documents. You can compose and view your documents stored on a hard drive or floppy disk that's attached to your computer. You can even navigate among your local documents with the languages' hyperlinking capabilities without ever being connected to the Internet, or any other network, for that matter. In fact, we recommend that you work locally to develop and thoroughly test your documents before you share them with others.
We strongly recommend, however, that you do get a connection to the Internet if you are serious about composing your own documents. You can download and view others' interesting web pages and see how they accomplished some interesting feature — good or bad. Learning by example is fun, too. (Reusing others' work, on the other hand, is often questionable, if not downright illegal.) An Internet connection is essential if you include in your work hyperlinks to other documents on the Internet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A First HTML Document
It seems every programming language book ever written starts off with a simple example on how to display the message, "Hello, World!" Well, you won't see a "Hello, World!" example in this book. After all, this is a style guide for the new millennium. Instead, ours sends greetings to the World Wide Web:
<html>
<head>
<title>My first HTML document</title>
</head>
<body>
<h2>My first HTML document</h2>
Hello, <i>World Wide Web!</i>
 <!-- No "Hello, World" for us -->
<p>
Greetings from<br>
<a href="http://www.ora.com">O'Reilly & Associates</a>
<p>
Composed with care by: 
<cite>(insert your name here)</cite>
<br>&copy;2000 and beyond
</body>
</html>
Go ahead: type in the example HTML source on a fresh word-processing page and save it on your local disk as myfirst.html. Make sure you select to save it in ASCII format; word processor-specific file formats like Microsoft Word's .doc files save hidden characters that can confuse the browser software and disrupt your HTML document's display.
After saving myfirst.html (or myfirst.htm, if you are using archaic DOS- or Windows 3.11-based file-naming conventions) onto disk, start up your browser and locate and open the file from the program's File menu. Your screen should look like Figure 2-1.
Figure 2-1: A very simple HTML document
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Embedded Tags
You probably noticed right away, perhaps in surprise, that the browser displays less than half of the example source text. Closer inspection of the source reveals that what's missing is everything that's bracketed inside a pair of less-than (<) and greater-than (>) characters. [Section 3.3.1]
HTML and XHTML are embedded languages: you insert their directions, or tags, into the same document that you and your readers load into a browser to view. The browser uses the information inside those tags to decide how to display or otherwise treat the subsequent contents of your document.
For instance, the <i> tag that follows the word "Hello" in the simple example tells the browser to display the following text in italics. [Section 4.5]
The first word in a tag is its formal name, which usually is fairly descriptive of its function, too. Any additional words in a tag are special attributes, sometimes with an associated value after an equals sign (=), which further define or modify the tag's actions.
Most tags define and affect a discrete region of your document. The region begins where the tag and its attributes first appear in the source document (a.k.a. the start tag ) and continues until a corresponding end tag. An end tag is the tag's name preceded by a forward slash (/ ). For example, the end tag that matches the "start italicizing" <i> tag is </i>.
End tags never include attributes. In HTML, most tags, but not all, have an end tag. And, to make life a bit easier for HTML authors, the browser software often infers an end tag from surrounding and obvious context, so you needn't explicitly include some end tags in your source HTML document. (We tell you which are optional and which are never omitted when we describe each tag in later chapters.) Our simple example is missing an end tag that is so commonly inferred and hence not included in the source that some veteran HTML authors don't even know that it exists. Which one?
The XHTML standard is much more rigid, insisting that all tags have corresponding end tags. [Section 16.3.2] [Section 16.3.3]
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML Skeleton
Notice, too, that our simple example HTML document starts and ends with <html> and </html> tags. These tags tell the browser that the entire document is composed in HTML. The HTML and XHTML standards require an <html> tag for compliant documents, but most browsers can detect and properly display HTML encoding in a text document that's missing this outermost structural tag. [<html>]
Like our example, all HTML and XHTML documents have two main structures: a head and a body, each bounded in the source by respectively named start and end tags. You put information about the document in the head and the contents you want displayed in the browser's window inside the body. Except in rare cases, you'll spend most of your time working on your document's body content. [<head>] [<body>]
There are several different document header tags that you can use to define how a particular document fits into a document collection and into the larger scheme of the Web. Some nonstandard header tags even animate your document.
For most documents, however, the important header element is the title. Standards require that every HTML and XHTML document have a title, even though the currently popular browsers don't enforce that rule. Choose a meaningful title, one that instantly tells the reader what the document is about. Enclose yours, as we do for the title of our example, between the <title> and </title> tags in your document's header. The popular browsers typically display the title at the top of the document's window. [<title>]
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Flesh on an HTML or XHTML Document
Except for the <html>, <head>, <body>, and <title> tags, the HTML and XHTML standards have few other required structural elements. You're free to include pretty much anything else in the contents of your document. (The web surfers among you know that authors have taken full advantage of that freedom, too.) Perhaps surprisingly, though, there are only three main types of HTML/XHTML content: tags (which we described previously), comments, and text.
A raw document with all its embedded tags can quickly become nearly unreadable, like computer-programming source code. We strongly recommend that you use comments to guide your composing eye.
Although it's part of your document, nothing in a comment, which goes between the special starting tag <!-- and ending tag --> comment delimiters, gets included in the browser display of your document. You see a comment in the source, as in our simple HTML example, but you don't see it on the display, as evidenced by our comment's absence in Figure 2-1. Anyone can download the source text of your documents and read the comments, though, so be careful what you write. [Section 3.5.3]
If it isn't a tag or a comment, it's text. The bulk of content in most of your HTML/XHTML documents — the part readers see on their browser displays — is text. Special tags give the text structure, such as headings, lists, and tables. Others advise the browser how the content should be formatted and displayed.
What about images and other multimedia elements we see and hear as part of our web browser displays? Aren't they part of the HTML document? No. The data that comprises digital images, movies, sounds, and other multimedia elements that may be included in the browser display are in documents separate from the main HTML/XHTML document. You include references to those multimedia elements via special tags. The browser uses the references to load and integrate other types of documents with your text.
We didn't include any special multimedia references in the previous example simply because they are separate, nontext documents that you can't just type into a text processor. We do, however, talk about and give examples of how to integrate images and other multimedia in your documents later in this chapter, as well as in extensive detail in subsequent chapters.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Text
Text-related HTML/XHTML markup tags comprise the richest set of all in the standard languages. That's because the original language — HTML — emerged as a way to enrich the structure and organization of text.
HTML came out of academia. What was and still is important to those early developers was the ability of their mostly academic, text-oriented documents to be scanned and read without sacrificing their ability to distribute documents over the Internet to a wide diversity of computer display platforms. (ASCII text is the only universal format on the global Internet.) Multimedia integration is something of an appendage to HTML and XHTML, albeit an important one.
Also, page layout is secondary to structure. We humans visually scan and decide textual relationships and structure based on how it looks; machines can only read encoded markings. Because documents have encoded tags that relate meaning, they lend themselves very well to computer-automated searches and also to the recompilation of content — features very important to researchers. It's not so much how something is said as what is being said.
Accordingly, neither HTML nor XHTML is a page-layout language. In fact, given the diversity of user-customizable browsers, as well as the diversity of computer platforms for retrieval and display of electronic documents, all these markup languages strive to accomplish is to advise, not dictate, how the document might look when rendered by the browser. You cannot force the browser to display your document in any certain way. You'll hurt your brain if you insist otherwise.
For instance, you cannot predict what font and what absolute size — 8- or 40-point Helvetica, Geneva, Subway, or whatever — will be used for a particular user's text display. Okay, so the latest browsers now support standard Cascading Style Sheets and other desktop publishing-like features that let you control the layout and appearance of your documents. But users may change their browser's display characteristics and override your carefully laid plans at will, quite a few of the older browsers out there don't support these new layout features, and some browsers are text-only with no nice fonts at all. What to do? Concentrate on content. Cool pages are a flash in the pan. Deep content will bring people back for more and more.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hyperlinks
While text may be the meat and bones of an HTML or XHTML document, the heart is hypertext. Hypertext gives users the ability to retrieve and display a different document in their own or someone else's collection simply by a click of the keyboard or mouse on an associated word or phrase (hyperlink ) in the document. Use these interactive hyperlinks to help readers easily navigate and find information in your own or others' collections of otherwise separate documents in a variety of formats, including multimedia, HTML, XHTML, other XML, and plain ASCII text. Hyperlinks literally bring the wealth of knowledge on the whole Internet to the tip of the mouse pointer.
To include a hyperlink to some other document in your own collection or on a server in Timbuktu, all you need to know is the document's unique address and how to drop an anchor into your document.
While it is hard to believe, given the millions, perhaps billions, of them out there, every document and resource on the Internet has a unique address, known as its uniform resource locator (URL; commonly pronounced "you-are-ell"). A URL consists of the document's name preceded by the hierarchy of directory names in which the file is stored (pathname ), the Internet domain name of the server that hosts the file, and the software and manner by which the browser and the document's host server communicate to exchange the document (protocol ):
protocol://server_domain_name/pathname
Here are some sample URLs:
http://www.kumquat.com/docs/catalog/price_list.html
price_list.html
http://www.kumquat.com/
ftp://ftp.netcom.com/pub/
The first example is an absolute or complete URL. It includes every part of the URL format: protocol, server, and the pathname of the document. While absolute URLs leave nothing to the imagination, they can lead to big headaches when you move documents to another directory or server. Fortunately, browsers also let you use
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Images Are Special
Image files are multimedia elements that you can reference with anchors in your document for separate download and display by the browser. But, unlike other multimedia, standard HTML and XHTML have an explicit provision for image display "inline" with the text, and images can serve as intricate maps of hyperlinks. That's because there is some consensus in the industry concerning image file formats — specifically, GIF and JPEG — and the graphical browsers have built-in decoders that integrate those image types into your document.
The HTML/XHTML tag for inline images is <img>; its required src attribute is the URL of the GIF or JPEG image you want to insert in the document. [<img>]
The browser separately loads images and places them into the text flow as if the image were some special, albeit sometimes very large, character. Normally, that means the browser aligns the bottom of the image to the bottom of the current line of text. You can change that with the special <img> align attribute, whose value you set to put the image at the top , middle, or bottom of adjacent text. Examine Figure 2-2 through Figure 2-4 for the image alignment you prefer. Of course, wide images may take up the whole line and hence break the text flow. You can also place an image by itself, by including preceding and following division, paragraph, or line-break tags.
Figure 2-2: An inline image aligned with the bottom of the text (default)
Figure 2-3: An inline image specially aligned with the middle of the text
Figure 2-4: An inline image specially aligned with the top of the text
Experienced HTML authors use images not only as supporting illustrations, but also as quite small inline characters or glyphs, added to aid browsing readers' eyes and to highlight sections of the documents. Veteran HTML authors commonly add custom list bullets or more distinctive section dividers than the conventional horizontal rules. Images, too, may be included in a hyperlink, so that users may select an inline thumbnail sketch to download a full-screen image. The possibilities with inline images are endless.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Lists, Searchable Documents, and Forms
Thought we'd exhausted text elements? Headers, paragraphs, and line breaks are just the rudimentary text-organizational elements of a document. The languages also provide several advanced text-based structures, including three types of lists, "searchable" documents, and forms. Searchable documents and forms go beyond text formatting, too; they are a way to interact with your readers. Forms let users enter text and click checkboxes and radio buttons to select particular items and then send that information back to the server. Once received, a special server application processes the form's information and responds accordingly; e.g., filling a product order or collecting data for a user survey.
The syntax for these special features and their various attributes can get rather complicated; they're not quick-start grist. We'll mention them here, but we urge you to read on for details in later chapters.
The three types of lists match those we are most familiar with: unordered, ordered, and definition lists. An unordered list — one in which the order of items is not important, such as a laundry or grocery list — gets bounded by <ul> and </ul> tags. Each item in the list, usually a word or short phrase, is marked by the <li> (list-item) tag and, particularly with XHTML, the </li> end tag. When rendered, the list item typically appears indented from the left margin and preceded by a bullet symbol. [<ul>] [<li>]
Ordered lists, bounded by the <ol> and </ol> tags, are identical in format to unordered ones, including the <li> tag (and </li> end tag with XHTML) for marking list items. However, the order of items is important — equipment assembly steps, for instance. The browser accordingly displays each item in the list preceded by an ascending number. [<ol>]
Definition lists are slightly more complicated than unordered and ordered lists. Within a definition list's enclosing <dl> and </dl> tags, each list item has two parts, each with a special tag: a short name or title, contained within a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tables
For a language that emerged from academia — a world steeped in data — it's not surprising to find that HTML (and now its progeny, XHTML) supports a set of tags for data tables that not only align your numbers but can specially format your text, too.
Five tags enable tables, including the <table> tag itself and a <caption> tag for including a description of the table. Special tag attributes let you change the look and dimensions of the table. You create a table row by row, putting between the table row (<tr> ) tag and its end tag (</tr>) either table header (<th> ) or table data (<td> ) tags and their respective contents for each cell in the table (end tags, too, with XHTML). Headers and data may contain nearly any regular content, including text, images, forms, and even another table. As a result, you can also use tables for advanced text formatting, such as for multicolumn text and sidebar headers (see Figure 2-5). For more information, see Chapter 10.
Figure 2-5: Tables let you perform page layout tricks, too
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Frames
Anyone who has had more than one application window open on her graphical desktop at a time can immediately appreciate the benefits of frames. Frames let you divide the browser window into multiple display areas, each containing a different document.
Figure 2-6 is an example of a frame display. It shows how the document window may be divided into independent windows separated by rule lines and scrollbars. What is not immediately apparent in the example, though, is that each frame displays an independent document, and not necessarily HTML or XHTML ones, either. A frame may contain any valid content that the browser is capable of displaying, including multimedia. If the frame's contents include a hypertext link that the user selects, the new document's contents, even another frame document, may replace that same frame, another frame's content, or the entire browser window.
Figure 2-6: Frames divide the browser's window into two or more independent document displays
Frames are defined in a special document, in which you replace the <body> tag with one or more <frameset> tags that tell the browser how to divide its main window into discrete frames. Special <frame> tags go inside the <frameset> tag and point to the documents that go inside the frames.
The individual documents referenced and displayed in the frame document window act independently, to a degree; the frame document controls the entire window. You can, however, direct one frame's document to load new content into another frame. In Figure 2-6, for example, selecting a Chapter hyperlink in the Table of Contents frame has the browser load and display that Chapter's contents in the frame on the right. That way, the Table of Contents is always available to the user as he browses the collection. For more information on frames, see Chapter 11.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Style Sheets and JavaScript
Browsers also have support for two powerful innovations to HTML: style sheets and JavaScript. Like their desktop-publishing cousins, style sheets let you control how your web pages look — text font styles and sizes, colors, backgrounds, alignments, and so on. More importantly, style sheets give you a way to impose display characteristics uniformly over the entire document and over an entire collection of documents.
JavaScript is a programming language with functions and commands that let you control how the browser behaves for the user. Now, this is not a JavaScript programming book, but we do cover the language in fair detail in later chapters to show you how to embed JavaScript programs into your documents and achieve some very powerful and fun effects.
The W3C — the putative standards organization — prefers that you use the Cascading Style Sheets (CSS) model for HTML/XHTML document design. Since Version 4, both Netscape and Internet Explorer support CSS and JavaScript. Netscape 4 alone also supports a JavaScript-based Style Sheet ( JSS) model, which we describe in Chapter 12, but we do not recommend that you use it. CSS is the universally approved, universally supported way to control how your documents might (not will) usually be displayed on users' browsers.
To illustrate CSS, here's a way to make all the top-level (H1) header text in your HTML document appear in the color red:
<html>
<head>
<title>CSS Example</title>
<!-- Hide CSS properties within comments so old browsers
don't choke on or display the unfamiliar contents. -->
  <style type="text/CSS">
    <!--
    H1 {color: red}
    -->
  </style>
</head>
<body>
<H1>I'll be red if your browser supports CSS</H1>
Something in between.
<H1>I should be red, too!</H1>
</body>
</html>
Of course, you can't see red in this black and white book, so we won't show the result in a figure. Believe us, or prove it to yourself by typing in and loading the example in your browser: the <H1>-enclosed text appears red on a color screen.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Forging Ahead
Clearly, this chapter represents the tip of the iceberg. If you've read this far, hopefully your appetite has been whetted for more. By now you've got a basic understanding of the scope and features of HTML and XHTML; proceed through subsequent chapters to expand your knowledge and learn more about each feature.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Anatomy of an HTML Document
Most HTML and XHTML documents are very simple, and writing one shouldn't intimidate even the most timid of computer users. First, although you might use a fancy WYSIWYG editor to help you compose it, a document is ultimately stored, distributed, and read by a browser as a simple ASCII text file. That's why even the poorest user with a barebones text editor can compose the most elaborate of web pages. (Accomplished webmasters often elicit the admiration of "newbies" by composing astonishingly cool pages using the crudest text editor on a cheap laptop computer and performing in odd places, such as on a bus or in the bathroom.) Authors should, however, keep several of the popular browsers on hand, including recent versions of each, and alternate among them to view new documents under construction. Remember, browsers differ in how they display a page, not all browsers implement all of the language standards, and some have their own special extensions.
Documents never look alike when displayed by a text editor and when displayed by a browser. Take a look at any source document on the Web. At the very least, return characters, tabs, and leading spaces, although important for readability of the source text document, are ignored for the most part. There also is a lot of extra text in a source document, mostly from the display tags and interactivity markers and their parameters that affect portions of the document but don't themselves appear in the display.
Accordingly, new authors are confronted with having to develop not only a presentation style for their web pages, but a different style for their source text. The source document's layout should highlight the programming-like markup aspects of HTML and XHTML, not their display aspects. And it should be readable not only by you, the author, but by others as well.
Experienced document writers typically adopt a programming-like style, albeit very relaxed, for their source text. We do the same throughout this book, and that style will become apparent as you compare our source examples with the actual display of the document by a browser.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Appearances Can Deceive
Documents never look alike when displayed by a text editor and when displayed by a browser. Take a look at any source document on the Web. At the very least, return characters, tabs, and leading spaces, although important for readability of the source text document, are ignored for the most part. There also is a lot of extra text in a source document, mostly from the display tags and interactivity markers and their parameters that affect portions of the document but don't themselves appear in the display.
Accordingly, new authors are confronted with having to develop not only a presentation style for their web pages, but a different style for their source text. The source document's layout should highlight the programming-like markup aspects of HTML and XHTML, not their display aspects. And it should be readable not only by you, the author, but by others as well.
Experienced document writers typically adopt a programming-like style, albeit very relaxed, for their source text. We do the same throughout this book, and that style will become apparent as you compare our source examples with the actual display of the document by a browser.
Our formatting style is simple, but it serves to create readable, easily maintained documents:
  • Except for the structural tags like <html>, <head>, and <body>, any element we use to structure the content of a document is placed on a separate line and indented to show its nesting level within the document. Such elements include lists, forms, tables, and similar tags.
  • Any element used to control the appearance or style of text is inserted in the current line of text. This includes basic font style tags like <b> (bold text) and document linkages like <a> (hypertext anchor).
  • Avoid, where possible, the breaking of a URL onto two lines.
  • Add extra newline characters to set apart special sections of the source document — for instance, around paragraphs or tables.
The task of maintaining the indentation of your source file ranges from trivial to onerous. Some text editors, like Emacs, manage the indentation automatically; others, like common word processors, couldn't care less about indentation and leave the task completely up to you. If your editor makes your life difficult, you might consider striking a compromise, perhaps by indenting the tags to show structure, but leaving the actual text without indentation to make modifications easier.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Structure of an HTML Document
HTML and XHTML documents consist of text, which defines the content of the document, and tags, which define the structure and appearance of the document. The structure of an HTML document is simple, consisting of an outer <html> tag enclosing the document head and body:
<html>
<head>
<title>Barebones HTML Document</title>
</head>
<body>
This illustrates, in a very <i>simp</i>le way,
the basic structure of an HTML document.
</body>
</html>
Each document has a head and a body, delimited by the <head> and <body> tags. The head is where you give your document a title and where you indicate other parameters the browser may use when displaying the document. The body is where you put the actual contents of the document. This includes the text for display and document-control markers (tags) that advise the browser how to display the text. Tags also reference special-effects files, including graphics and sound, and indicate the hot spots (hyperlinks and anchors) that link your document to other documents.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tags and Attributes
For the most part, tags — the markup elements of HTML and XHTML — are simple to understand and use, since they are made up of common words, abbreviations, and notations. For instance, the <i> and </i> tags respectively tell the browser to start and stop italicizing the text characters that come between them. Accordingly, the syllable "simp" in our barebones example above would appear italicized on a browser display.
The HTML and XHTML standards and their various extensions define how and where you place tags within a document. Let's take a closer look at that syntactic sugar that holds together all documents.
Every tag consists of a tag name, sometimes followed by an optional list of tag attributes, all placed between opening and closing brackets (< and >). The simplest tag is nothing more than a name appropriately enclosed in brackets, such as <head> and <i>. More complicated tags contain one or more attributes, which specify or modify the behavior of the tag.
According to the HTML standard, tag and attribute names are not case-sensitive. There's no difference in effect between <head>, <Head>, <HEAD>, or even <HeaD>; they are all equivalent. With XHTML, case is important: all current standard tag and attribute names are in lowercase.
For both HTML and XHTML, the values that you assign to a particular attribute may be case-sensitive, depending on your browser and server. In particular, file location and name references — or uniform resource locators (URLs) — are case-sensitive. [Section 6.2]
Tag attributes, if any, belong after the tag name, each separated by one or more tab, space, or return characters. Their order of appearance is not important.
A tag attribute's value, if any, follows an equals sign (=) after the attribute name. You may include spaces around the equals sign, so that width=6, width = 6, width =6, and width= 6 all mean the same. For readability, however, we prefer not to include spaces. That way, it's easier to pick out an attribute/value pair from a crowd of pairs in a lengthy tag.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Well-Formed Documents and XHTML
XHTML is HTML's prissy cousin. What would pass most beauty contests as a very proper and complete HTML document, done according to the book and including end-paragraph tags, might well be rejected by the XML judges as a malformed file.
To conform with XML, XHTML insists that documents be "well formed." Among other things, that means that every tag must have an ending tag — even the ones like <br> and <hr> for which the HTML standard forbids the use of an end tag. With XHTML, the ending is placed inside the start tag: <br />, for example. [Section 16.3.3]
It also means that tag and attribute names are case-sensitive and, according to the current XHTML standard, must be in lowercase. Hence, only <head> is acceptable, and it is not the same as <HEAD> or <HeAd>, as it is with the HTML standard. [Section 16.3.4]
Well-formed XHTML documents, like HTML standard ones, must also conform to proper nesting. No argument there. [Section 16.3.1]
In their defense, the XML standard and its offspring, XHTML, emphasize extensibility. That way, <p> can mean the beginning of a paragraph in HTML, whereas another variant of the language may define the contents of the <P> tag to be election-poll results that display quite differently — perhaps in tabular form, with red, white, and blue stripes and accompanying patriotic music.
We will discuss this further in Chapter 15 and Chapter 16, in which we detail the XML and XHTML standards (and the Forces of Conformity).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Document Content
Nearly everything else you put into your HTML or XHTML document that isn't a tag is by definition content, and the majority of that is text. Like tags, document content is encoded using a specific character set — by default, the ISO-8859-1 Latin character set. This character set is a superset of conventional ASCII, adding the necessary characters to support the Western European languages. If your keyboard does not allow you to directly enter the characters you need, you can use character entities to insert the desired characters.
Perhaps the hardest rule to remember when marking up an HTML or XHTML document is that all the tags you insert regarding text display and formatting are only advice for the browser: they do not explicitly control how the browser will display the document. In fact, the browser can choose to ignore all of your tags and do what it pleases with the document content. What's worse, the user (of all people!) has control over the text-display characteristics of his or her own browser.
Get used to this lack of control. The best way to use markup to control the appearance of your documents is to concentrate on the content of the document, not on its final appearance. If you find yourself worrying excessively about spacing, alignment, text breaks, and character positioning, you'll surely end up with ulcers. You will have gone beyond the intent of HTML. If you focus on delivering information to users in an attractive manner, using the tags to advise the browser as to how best to display that information, you are using HTML or XHTML effectively, and your documents will render well on a wide range of browsers.
Besides common text, HTML and XHTML give you a way to display special text characters that you might not normally be able to include in your source document or that have other purposes. A good example is the less-than or opening bracket symbol (<). In HTML, it normally signifies the start of a tag, so if you insert it simply as part of your text, the browser will get confused and probably misinterpret your document.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTML/XHTML Document Elements
Every HTML document should conform to the HTML SGML DTD, the formal Document Type Definition that defines the HTML standard. The DTD defines the tags and syntax that are used to create an HTML document. You can inform the browser which DTD your document complies with by placing a special SGML (Standard Generalized Markup Language) command in the first line of the document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
This cryptic message indicates that your document is intended to be compliant with the HTML 4.01 final DTD defined by the World Wide Web Consortium (W3C). Other versions of the DTD define more restricted versions of the HTML standard, and not all browsers support all versions of the HTML DTD. In fact, specifying any other doctype may cause the browser to misinterpret your document when displaying it for the user. It's also unclear what doctype to use when including in the HTML document the various tags that are not standards but are very popular features of a popular browser — the Netscape extensions, for instance, or even the deprecated HTML 3.0 standard, for which a DTD was never released.
Almost no one precedes their HTML documents with the SGML doctype command. Because of the confusion of versions and standards, we don't recommend that you include the prefix with your HTML documents either.
On the other hand, we do strongly recommend that you include the proper doctype statement in your XHTML documents, in conformance with XML standards. Read Chapter 15 and Chapter 16 for more about DTDs and the XML and XHTML standards.
As we saw earlier, the <html> and </html> tags serve to delimit the beginning and end of a document. Since the typical browser can easily infer from the enclosed source that it is an HTML or XHTML document, you don't really need to include the tag in your source HTML document.