XML Is Picky

Despite XML’s flexibility, it is pickier in places than HTML. There are syntax and grammar rules that your data must follow. These rules are set down rather tersely in the XML specification found at http://www.w3.org/TR/1998/REC-xml-19980210. Rather than poring through the official spec, I recommend you seek out one of the annotated versions, like Tim Bray’s version at http://www.xml.com, or Robert Ducharme’s book XML: The Annotated Specification (Prentice Hall). The former is online and free; the latter has many good examples of actual XML code.

Here are two of the XML rules that tend to trip up people who know HTML:

  1. If you begin something, you must end it. In the above example we started a machine listing with <machine> and finished it with </machine>. Leaving off the ending tag would not have been acceptable XML.

    In HTML, tags like <img src="picture.jpg" > are legally allowed to stand by themselves. Not so in XML; this would have to be written either as:

<img src="picture.jpg" > </img>

or:

<img src="picture.jpg" />

The extra slash at the end of this last tag lets the XML parser know that this single tag serves as both its own start and end tag. Data and its surrounding start and end tags is called an element.

  1. Start tags and end tags must mirror themselves exactly. Mixing case in not allowed. If your start tag is <MaChINe>, your end tag must be </MaChINe>, and cannot be </MACHine> or any other case combination. HTML is much more forgiving in this regard.

These are two of ...

Get Perl for System Administration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.