HTML Versus XHTML

It’s not Latin, but HTML has reached old age in standard version 4.01. The W3C has no plans to develop another version and has officially said so. Rather, HTML is being subsumed and modularized as an Extensible Markup Language (XML). Its new name is XHTML, Extensible Hyptertext Markup Language.

The emergence of XHTML is just another chapter in the often tumultuous history of HTML and the Web, where confusion for authors is the norm, not the exception. At its nadir, the elders of the W3C responsible for accepted and acceptable uses of the language—standards—lost control of the language in the browser “wars” between Netscape and Microsoft. The abortive HTML+ standard never got off the ground, and HTML 3.0 became so bogged down in debate that the W3C simply shelved the entire draft. HTML 3.0 never happened, despite what some opportunistic marketers claimed in their literature. Instead, by late 1996, the browser manufacturers convinced the W3C to release HTML standard version 3.2, which for all intents and purposes simply standardized most of Netscape’s HTML extensions.

Netscape’s dominance as the leading browser, and as a leader in web technologies, faded dramatically toward the end of the millennium. By then, Microsoft had effectively bundled Internet Explorer into the Windows operating system, not only as an installed application, but also as a dominant feature of the GUI desktop. In addition, Internet Explorer introduced several features (albeit nonstandard at the time) that appealed principally to the growing Internet business and marketing community.

Fortunately for those of us who appreciate and strongly support standards, the W3C took back its primacy role with HTML 4.0, which stands today as HTML version 4.01, released in December 1999. Absorbing many of the Netscape and Internet Explorer innovations, the standard is clearer and cleaner than any previous ones, establishes solid implementation models for consistency across browsers and platforms, provides strong support and incentives for the companion Cascading Style Sheets (CSS) standard for HTML-based displays, and makes provisions for alternative (nonvisual) user agents, as well as for more universal language supports.

Cleaner and clearer aside, the W3C realized that HTML could never keep up with the demands of the web community for more ways to distribute, process, and display documents. HTML offers only a limited set of document-creation primitives and is hopelessly incapable of handling nontraditional content like chemical formulae, musical notation, and mathematical expressions. Nor can it well support alternative display media, such as handheld computers and intelligent cellular phones.

To address these demands, the W3C developed the XML standard. XML provides a way to create new, standards-based markup languages that don’t take an act of the W3C to implement. XML-compliant languages deliver information that can be parsed, processed, displayed, sliced, and diced by the many different communication technologies that have emerged since the Web sparked the digital communication revolution a decade ago. XHTML is HTML reformulated to adhere to the XML standard. It is the foundation language for the future of the Web.

Why not just drop HTML for XHTML? For many reasons. First and foremost, XHTML has not exactly taken the Web by storm. There’s just too much current investment in HTML-based documentation and expertise for that to happen anytime soon. Besides, XHTML is HTML 4.01 reformulated as an application of XML. Know HTML 4 and you’re all ready for the future.[*]

Deprecated Features

One of the unpopular things standards bearers have to do is make choices between popular and proper. The authors of the HTML and XHTML standards exercise that responsibility by “deprecating” those features of the language that interfere in the grand scheme of things.

For instance, the <center> tag tells the browser to display the enclosed text centered in the display window. But the CSS standard provides ways to center text, too. The W3C chooses to support the CSS way and discourages the use of <center> by deprecating the tag. The plan is, in some later standard version, to stop using <center> and other deprecated elements and attributes of the language.

Throughout the book, we specially note and continuously remind you when an HTML tag or other component is deprecated in the current standards. Should you stop using them now? Yes and no.

Yes, because there is a preferred and perhaps better way to accomplish the same thing. By exercising that alternative, you ensure that your documents will survive for many years to come on the Web. And, yes, because the tools you may use to prepare HTML/XHTML documents probably adhere to the preferred standard. You may not have a choice, unless you disable your tools. In any event, unless you hand-compose all your documents, you’ll need to know how the preferred way works so that you can identify the code and modify it.

However compelling the reasons for not using deprecated elements and attributes are, they still are part of the standards. They remain well supported by most browsers and aren’t expected to disappear anytime soon. In fact, since there is no plan to change the HTML standard, the “deprecated” stamp is very misleading.

So, no, you don’t have to worry about deprecated HTML features. There is no reason to panic, certainly. We do, however, encourage you to make a move toward the standards soon.

A Definitive Guide

The paradox in all this is that even the HTML 4.01 standard is not the definitive resource. There are many more features of HTML in popular use and supported by the popular browsers than are included in the latest language standard. And there are many parts of the standards that are ignored. We promise you, things can get downright confusing.

We’ve managed to sort things out for you, though, so you don’t have to sweat over what works and doesn’t work with what browser. This book, therefore, is the definitive guide to HTML and XHTML. We give details for all the elements of the HTML 4.01 and XHTML 1.0 standards, plus the variety of interesting and useful extensions to the language. We also include detailed discussions of the CSS standard, since it is so intricately related to web page development.

In addition, there are a few things that are closely related but not directly part of HTML. For example, we touch, but do not handle, JavaScript, Common Gateway Interface (CGI), and Java programming. They all work closely with HTML documents and run with or alongside browsers, but they are not part of the language itself, so we don’t delve into them. Besides, they are comprehensive topics that deserve their own books, such as JavaScript: The Definitive Guide, by David Flanagan; CGI Programming with Perl, by Scott Guelich, Shishir Gundavaram, and Gunther Birzneiks; Cascading Style Sheets: The Definitive Guide, by Eric Meyer; and Learning Java, by Pat Niemeyer and Jonathan Knudsen (all published by O’Reilly).

This is your definitive guide to HTML and XHTML as they are and should be used, including every extension we could find. Some extensions aren’t documented anywhere, even in the plethora of online guides. But, if we’ve missed anything, certainly let us know and we’ll put it in the next edition.



[*] We plumb the depths of XML and XHTML in Chapters 15 and 16.

Get HTML & XHTML: The Definitive Guide, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.