Chapter 15. XML

HTML is a maverick. It follows the rules of formal electronic document-markup design and implementation only loosely. HTML was born out of the need to assemble text, graphics, and other digital content into electronic documents that could be sent over the global Internet. In the early days of the Web’s boom, the demand for better browsers and document servers — driven by hordes of new users with insatiable appetites for more and cooler web pages — left little time for worrying about things like standards and practices.

Of course, without guiding standards, HTML would eventually have devolved into Babel. That almost happened during the browser wars in the mid- to late-1990s. Chaos is not an acceptable foundation for an industry whose value is already measured in the trillions of dollars. Although the standards people at the W3C managed to rein in the maverick HTML with standard Version 4, it is still too wild for the royal herd of markup languages.

The HTML 4.01 standard is defined using the Standardized Generalized Markup Language (SGML). While more than adequate for formalizing HTML, SGML is far too complex to use as a general tool for extending and enhancing HTML. Instead, the W3C has devised a standard known as the Extensible Markup Language, or XML. Based on the simpler features of SGML, XML is kinder, gentler, and more flexible, well suited to guiding the birth and orderly development of new markup languages. With XML, HTML itself is being reborn as XHTML.

In this ...

Get HTML & XHTML: The Definitive Guide, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.