Validating Documents
Words, words, mere words, no matter from the heart.
In this section, we talk about DTDs and XML Schema, two ways to enforce rules in an XML document. A DTD is a simple grammar guide for an XML document, defining which tags may appear where, in what order, with what attributes, etc. XML Schema is the next generation of DTD. With XML Schema, you can describe the data content of the document as well as the structure. XML Schemas are written in terms of primitives, such as numbers, dates, and simple regular expressions, and also allow the user to define complex types in a grammar-like fashion. The word schema means a blueprint or plan for structure, so we’ll refer to DTDs and XML Schema collectively as schema where either applies.
DTDs, although much more limited in capability, are still widely
used. This may be partly due to the complexity involved in writing XML
Schemas by hand. The W3C XML Schema standard is verbose and cumbersome,
which may explain why several alternative syntaxes have sprung up. The
javax.xml.validation API
performs XML validation in a pluggable way. Out of the box, it supports
only W3C XML Schema, but new schema languages can be added in the future.
Validating with a DTD is supported as an older feature directly in the SAX
parser. We’ll use both in this section.
Using Document Validation
XML’s validation of documents is a key piece of what makes it useful as a data format. Using a schema is somewhat analogous ...