Transforming an XML Repository into Reviewable Web Pages

I began writing this book in HTML, then switched midstream to XML. Using terms from Chapter 5, I converted a docbase whose repository format and delivery format were both the same stream of HTML into a docbase whose repository format is XML and whose delivery format is HTML. Here are the lessons I learned when I did that.

You Can Easily Convert HTML to Equivalent XML

XML doesn’t have to involve complex document type definitions (DTDs) written in weird syntax that’s hard to understand and use. Of course, there are good reasons to use DTDs, but the inventors of XML wisely chose to make them optional. As a result, the initial conversion of my HTML manuscript to XML was a trivial exercise that took just a few hours. There were just three rules I had to apply:

  • Close all tags.

  • Quote all attributes.

  • Escape ampersands.

I used keystroke macros in my text editor to add end tags to <p>, <li>, and <img> elements. To close an empty tag such as <img>—that is, a tag that has no content other than its attributes—you need only precede the trailing angle bracket with a forward slash, like this:

<img src="fig2.gif"/>

I also used search-and-replace to escape the ampersand (&), which in XML is written as &amp;. This applies to invidual ampersands as well as those that introduce HTML entities such as &lt; and &gt;, which represent < and >.

This XML-ized flavor of HTML now has its own acronym: XHTML (Extensible HyperText Markup Language, ...

Get Practical Internet Groupware now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.