Document Type Definitions

As we have just discussed, an XML document is not very usable without an accompanying DTD. Just as XML can effectively describe data, the DTD makes this data usable in a variety of ways by many different programs by defining the structure of the data. In this section, we will look at the constructs for a DTD. We will again use as an example the XML representation of a portion of the table of contents for this book, and we will go through the process of constructing a DTD for the XML table of contents document.

The DTD’s job is to define how data must be formatted. It must define each allowed element in an XML document, the allowed attributes, and possibly the acceptable attribute values for each element, the nesting and occurrences of each element, and any external entities. In fact, DTDs can specify quite a few other things about an XML document, but these basics are what we will focus on. We will learn the constructs that a DTD offers by applying them to and constraining our example XML file from Chapter 2. Because we will be referring to that file often throughout this chapter, it is reprinted here in Example 4.3.

Example 4-3. Table of Contents XML File

<?xml version="1.0"?>
<?xml-stylesheet href="XSL\JavaXML.html.xsl" type="text/xsl"?>
<?xml-stylesheet href="XSL\JavaXML.wml.xsl" type="text/xsl" 
                 media="wap"?>
<?cocoon-process type="xslt"?>
<!DOCTYPE JavaXML:Book SYSTEM "DTD\JavaXML.dtd">

<!-- Java and XML -->
<JavaXML:Book xmlns:JavaXML="http://www.oreilly.com/catalog/javaxml/">
 <JavaXML:Title>Java and XML</JavaXML:Title>
 <JavaXML:Contents>

  <JavaXML:Chapter focus="XML">
   <JavaXML:Heading>Introduction</JavaXML:Heading>
   <JavaXML:Topic subSections="7">What Is It?</JavaXML:Topic>
   <JavaXML:Topic subSections="3">How Do I Use It?</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Why Should I Use It?</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:Chapter focus="XML">
   <JavaXML:Heading>Creating XML</JavaXML:Heading>
   <JavaXML:Topic subSections="0">An XML Document</JavaXML:Topic>
   <JavaXML:Topic subSections="2">The Header</JavaXML:Topic>
   <JavaXML:Topic subSections="6">The Content</JavaXML:Topic>
   <JavaXML:Topic subSections="1">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:Chapter focus="Java">
   <JavaXML:Heading>Parsing XML</JavaXML:Heading>
   <JavaXML:Topic subSections="3">Getting Prepared</JavaXML:Topic>
   <JavaXML:Topic subSections="3">SAX Readers</JavaXML:Topic>
   <JavaXML:Topic subSections="9">Content Handlers</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Error Handlers</JavaXML:Topic>
   <JavaXML:Topic subSections="0">
     A Better Way to Load a Parser
   </JavaXML:Topic>
   <JavaXML:Topic subSections="4">"Gotcha!"</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

  <JavaXML:SectionBreak/>

  <JavaXML:Chapter focus="Java">
   <JavaXML:Heading>Web Publishing Frameworks</JavaXML:Heading>
   <JavaXML:Topic subSections="4">Selecting a Framework</JavaXML:Topic>
   <JavaXML:Topic subSections="4">Installation</JavaXML:Topic>
   <JavaXML:Topic subSections="3">
     Using a Publishing Framework
   </JavaXML:Topic>
   <JavaXML:Topic subSections="2">XSP</JavaXML:Topic>
   <JavaXML:Topic subSections="3">Cocoon 2.0 and Beyond</JavaXML:Topic>
   <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic>
  </JavaXML:Chapter>

 </JavaXML:Contents>

 <JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright>

</JavaXML:Book>

Specifying Elements

Our first concern is specifying which elements are allowed within the document. We want content authors using this DTD to be able to use elements such as JavaXML:Book and JavaXML:Contents, but not to be able to use elements like JavaXML:foo and JavaXML:bar. When we decide on a set of allowed elements, we begin to give a semantic meaning to our XML document; in other words, we give it a context in which it is useful. First, then, we want to make a list of all allowed elements. The easiest way to make this list is to scan our XML document and make a note of each element being used. It also is a good idea to define the purpose of each tag. Although this is not something defined in the DTD unless by a comment (not a bad idea!), it helps you, the DTD author, keep things straight. Table 4.1 has a complete listing of the elements in the contents.xml document.

Table 4-1. Elements Allowed for Our XML Document

Element Name

Purpose

JavaXML:Book

Overall root element

JavaXML:Title

Title of the book being documented

JavaXML:Contents

Denotes the table of contents

JavaXML:Chapter

A chapter within the book

JavaXML:Heading

The heading (title) of a chapter

JavaXML:Topic

The main focus of a section within a chapter

JavaXML:SectionBreak

A break between chapters denoting a new section of the book

JavaXML:Copyright

The copyright for the book

With these elements defined, we can now specify each one in our DTD. This is done with the following notation:

<!ELEMENT [Element Name] [Element Definition/Type]>

The [Element Name] is the actual element from our table. This name, as in the table, should include the namespace prefix. Within the DTD, there is no idea of an element with a namespace prefix, and then a mapping from a namespace URI to that prefix. Within a DTD, the element name is either the name itself, when no namespace is used, or the namespace prefix and element name separated by a colon.

The [Element Definition/Type] is the most useful portion of the DTD. It allows the data within the element to be defined, giving a “type” to the element, whether it is pure data or a compound type consisting of data and other elements. The most unrestrictive element type is the keyword ANY . Using this keyword allows the element to contain textual data, nested elements, or any legal XML combination of the two. Thus, we can now define all the elements in our XML document within our DTD, albeit not in a very useful way. Example 4.4 shows the beginning of a DTD for our XML document.

Example 4-4. A “Bare-Bones” DTD with Element Definitions

<!ELEMENT JavaXML:Book ANY>
<!ELEMENT JavaXML:Title ANY>
<!ELEMENT JavaXML:Contents ANY>
<!ELEMENT JavaXML:Chapter ANY>
<!ELEMENT JavaXML:Heading ANY>
<!ELEMENT JavaXML:Topic ANY>
<!ELEMENT JavaXML:SectionBreak ANY>
<!ELEMENT JavaXML:Copyright ANY>

Of course, this simple DTD, in addition to not handling either attributes or entity references, doesn’t help us much. Although it defines each allowed element, it says nothing about the types of those elements, or the nesting allowed. It would still be simple to create a nonsensical XML document that conformed to this DTD, as in Example 4.5.

Example 4-5. A Conformant XML Document That Is Useless

<?xml version="1.0"?>
<?xml-stylesheet href="XSL\JavaXML.html.xsl" type="text/xsl"?>
<?xml-stylesheet href="XSL\JavaXML.wml.xsl" type="text/xsl" 
                 media="wap"?>
<?cocoon-process type="xslt"?>
<!DOCTYPE JavaXML:Book SYSTEM "DTD\JavaXML.dtd">

<JavaXML:Topic>
  <JavaXML:Book>Here's my Book</JavaXML:Book>
  <JavaXML:Copyright>
    <JavaXML:Chapter>Chapter One</JavaXML:Chapter>
  </JavaXML:Copyright>
  <JavaXML:SectionBreak>Here's a Section</JavaXML:SectionBreak>
</JavaXML:Topic>

Although this document fragment uses only elements allowed by the DTD, its structure is incorrect. This is because the DTD gives no information about how elements are nested and which elements can contain textual data.

Nesting elements

One of the keys to XML document structure is the nesting of tags. We can expand on our original table of elements by adding the elements that can be nested within each structure. This will create our element hierarchy for us, which we can then define within our DTD. Table 4.2 summarizes the element hierarchy.

Table 4-2. Element Hierarchy

Element Name

Allowed Nested Elements

Purpose

JavaXML:Book
JavaXML:Title
JavaXML:Contents
JavaXML:Copyright

Overall root element

JavaXML:Title

None

Title of the book being documented

JavaXML:Contents
JavaXML:Chapter
JavaXML:SectionBreak

Denotes the table of contents

JavaXML:Chapter
JavaXML:Heading 
JavaXML:Topic

A chapter within the book

JavaXML:Heading

None

The heading (title) of a chapter

JavaXML:Topic

None

The main focus of a section within a chapter

JavaXML:SectionBreak

None

A break between chapters denoting a new section of the book

JavaXML:Copyright

None

The copyright for the book

With this table complete, we are now ready to define the allowed element nestings within our DTD. The way to perform this is:

<!ELEMENT [Element Name] ([Nested Element][,Nested Element]...)>

In this case, a list of comma-separated elements within parentheses becomes the element type. The order of the elements is also important; this ordering is enforced as a validity constraint within the XML document. This adds additional constraints to our document, ensuring that a copyright element always comes at the end of a book, or that a title element appears before content elements. With this new notation, we can update our DTD to add the allowed nestings of elements, shown in Example 4.6.

Example 4-6. DTD with Element Hierarchy

                     <!ELEMENT JavaXML:Book (JavaXML:Title, 
                        JavaXML:Contents, 
                        JavaXML:Copyright)>
<!ELEMENT JavaXML:Title ANY>
<!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)>
<!ELEMENT JavaXML:Heading ANY>
<!ELEMENT JavaXML:Topic ANY>
<!ELEMENT JavaXML:SectionBreak ANY>
<!ELEMENT JavaXML:Copyright ANY>

Although some elements, those that contain parsed data, are not changed, we have a hierarchy of elements that adds a lot of meaning to our XML document constraints. The earlier example that made no sense because of element ordering and nesting would now be invalid. However, there are still a lot of problems with allowing any type of data within the remaining elements.

Parsed data

The element type to use for textual data is #PCDATA . This keyword represents Parsed Character Data, and can be used for elements that contain character data that we want our XML parser to handle normally. Using the #PCDATA keyword limits the element to using only character data, though; nested elements are not allowed. We will discuss situations like this a little later. For now, we can modify our title, heading, and topic elements to reflect that textual data should be used within these elements, as in Example 4.7.

Example 4-7. DTD with Element Hierarchy and Character Data Elements

<!ELEMENT JavaXML:Book (JavaXML:Title, 
                        JavaXML:Contents, 
                        JavaXML:Copyright)>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ELEMENT JavaXML:SectionBreak ANY>
<!ELEMENT JavaXML:Copyright ANY>

Empty elements

We are moving right along in our element definitions within DTDs. In addition to elements that contain textual data and elements that contain other elements, we have one element, JavaXML:SectionBreak, which should contain no data. In other words, the element should always be empty. Although it would be legal to specify that this element contained parsed character data and simply never insert any, this isn’t a good use of our constraints. It is better to actually require that the element always be empty, preventing accidental misuse. The keyword EMPTY allows this constraint. This keyword does not need to appear within parentheses, as it denotes a type and cannot be grouped with any other elements, which, as we will soon see, the parentheses allow. We can update our section break element in our DTD now in Example 4.8.

Example 4-8. DTD with EMPTY Element Defined

<!ELEMENT JavaXML:Book (JavaXML:Title, 
                        JavaXML:Contents, 
                        JavaXML:Copyright)>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright ANY>

Entity references

The last element we have to define more rigidly is the JavaXML:Copyright element. As you recall, this is actually a container for an entity reference to another file that should be included. When our XML sees &OReillyCopyright;, it will attempt to look up the OReillyCopyright entity within the DTD, which in our case should reference an external file. This external file should have a shared copyright for all books being documented in XML. The DTD has the job of specifying where the external file is located, and how it should be accessed. In our case, we assume that the copyright file is on the local filesystem, and we want to reference that file. Entity references are specified in DTDs with the notation:

<!ENTITY [Entity Name] "[Replacement Characters/Identifier]">

You will notice that the notation indicated that a set of replacement characters could be specified, allowing substitution similar to using an external file. In fact, this is how the “escape” characters within XML are handled:

<!ENTITY &amp; "&">
<!ENTITY &lt; "<">
<!ENTITY &gt; ">">
...

So if our copyright was a very short piece of text, we could use something like:

<!ENTITY &OReillyCopyright; 
         "Copyright O&apos;Reilly and Associates, 2000">

However, the copyright we expect to use is a longer piece of text, more appropriately stored in an external file for easy modification. This also allows it to be used in multiple XML documents without duplication of the data within each document’s DTD. This requires us to specify a system-level resource as the resolution for the entity reference. The notation for this type of reference is:

<!ENTITY [Entity Reference] SYSTEM "[URI]">

As in the case of parsing our XML document and our discussion on namespaces, the URI specified can be either a local resource or a network-accessible resource. In our case, we want to use a file located on an external server, so the entity would reference that file through a URL:

<!ENTITY OReillyCopyright SYSTEM 
         "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">

With this reference set up, an XML parser could now handle the OReillyCopyright reference within an XML document and properly resolve it within the parsing process. This section of the XML had to be commented out in Chapter 3, for this very reason, and in the next chapter, we will uncomment the reference and see how a validating parser handles the entity and uses a DTD to resolve it.

Finally, we need to let our containing element know it should expect parsed character data:

<!ELEMENT JavaXML:Copyright (#PCDATA)>

Say It Again One More Time

The last major construct in DTD element specifications we will look at is the variety of combinations of grouping, multiple occurrences, and choices within an element. In other words, the case where element X can appear once, or element Y can occur, followed by element Z. These structures are critical to DTDs; by default, an element can appear exactly once when specified without any modifiers in the DTD:

<!ELEMENT MyElement (NestedElement, AnotherElement)>

Here NestedElement must appear exactly once, and must always be followed by exactly one AnotherElement. If this were not the structure of the corresponding XML document, the document would be invalid. A special set of modifiers must be applied to elements to change this default constraining behavior.

Zero, one, or more

The most common modifier applied to an element is a recurrence operator. These operators allow an element to appear zero or more times, one or more times, or optionally not at all, in addition to the default, which requires an element to appear exactly one time. Table 4.3 lists each of the recurrence operators and what recurrence they indicate.

Table 4-3. Recurrence Operators

Operator

Description

[Default]

Must appear exactly one time

?

Must appear once or not at all

+

Must appear at least once (1 ... N times)

*

May appear any number of times, or not at all (0 ... N times)

Each operator can be appended to the end of an element name. In our previous example, to allow NestedElement to appear one or more times, and then require that AnotherElement appear either once or not at all, we would use the following within the DTD:

<!ELEMENT MyElement (NestedElement+, AnotherElement?)>

This would make the following XML perfectly valid:

<MyElement>
  <NestedElement>One</NestedElement>
  <NestedElement>Two</NestedElement>
</MyElement>

In the DTD we have been building, we have a similar situation within the JavaXML:Chapter element. We would like to allow a chapter heading (JavaXML:Heading) to either appear once, or optionally be omitted, and to allow one or more JavaXML:Topic elements to appear. We can now make this change using our recurrence operators:

<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>

This easy change makes our XML chapter representation much more realistic. We also need to make a change to the JavaXML:Contents element definition. A chapter or set of chapters should appear, and then possibly a section break. The section break must be optional, as a book may only contain chapters. We can define the recurrence of chapters and the section break elements like this:

<!ELEMENT JavaXML:Contents (JavaXML:Chapter+,JavaXML:SectionBreak?)>

However, we still have not let the DTD know that more chapters can appear after the JavaXML:SectionBreak element. In fact, if we look at the structure of the XML we would like to allow this structure to occur multiple times. Chapters followed by a section break can be followed by more chapters followed by another section break! We need a concept of grouping within our element.

Grouping

Grouping allows us to solve problems like the element nesting within JavaXML:Contents. Often, recurrence occurs for a block or group of elements, rather than a single element. For this reason, any of the recurrence operators can be applied to a group of elements. Enclosing a set of elements within parentheses signifies the group. If you are starting to remember your old LISP classes in college, don’t worry; it stays fairly simple in our examples, and the parentheses don’t get out of hand. Nested parentheses are, of course, acceptable. So to group a set of elements the following notation would be used:

<!ELEMENT GroupingExample ((Group1El1, Group1El2),
                           (Group2El1, Group2El2))>

An operator can then be applied to the group, rather than to a single element. In the scenario we are currently looking at, we need to apply the operator allowing multiple occurrences to the group containing our chapter and section break element. This would then allow repetition of the entire construct:

<!ELEMENT JavaXML:Contents (JavaXML:Chapter+,JavaXML:SectionBreak?)+>

This now accurately allows the various combinations: a set of chapters followed by one section break, and then the structure repeating multiple times or optionally not repeating at all. It also allows the case where only chapters are included, without any section breaks. However, this is not particularly clear from the DTD. What would be better is to specify that one or more chapters could occur, or this structure could occur. Although this is not going to result in different behavior, it certainly would make more sense to readers other than the DTD author. To accomplish this, though, we need to introduce an “or” function.

Either or

DTDs do conveniently offer an “or” function, signified by the pipe operator. This allows one thing or the other to occur, and the pipe is often used in conjunction with groupings. One common, although not necessarily good, use of the “or” operator is to allow a certain element or elements to appear within an enclosing element, or for textual data to appear:

<!ELEMENT AggregateElement (#PCDATA|(Element1, Element2))>

For this DTD, both of the following XML document fragments would be valid:

<AggregateElement>
  <Element1>One</Element1>
  <Element2>Two</Element2>
</AggregateElement>

<AggregateElement>
  Textual Data
</AggregateElement>

Using this type of constraint is discouraged, though, as the meaning of the enclosing element becomes obscure. An element should typically include textual, parsed data, or other elements; it should not allow both.

In our document, we want to show a clearer representation of our JavaXML:Contents element. We can now do that:

<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+) |
                            (JavaXML:Chapter+,JavaXML:SectionBreak?)+)>

It is now clear that either multiple chapters may appear, or that chapters followed by a section break may appear. This greatly adds to the documentation that our DTD provides, as well as maintaining the proper constraints.

We have now completely specified and constrained our XML elements. The DTD shown in Example 4.9 should function in regard to our elements, and only attribute definitions are left, which we will look at next.

Example 4-9. DTD with Elements Specified

<!ELEMENT JavaXML:Book (JavaXML:Title,
                        JavaXML:Contents,
                        JavaXML:Copyright)>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)|
                            (JavaXML:Chapter+, JavaXML:SectionBreak?)+)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright (#PCDATA)>
<!ENTITY OReillyCopyright SYSTEM 
         "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">

Defining Attributes

With element specifications thoroughly covered, we can move on to defining attributes. Because there are not complicated nesting scenarios with attributes, defining them is somewhat simpler than dealing with element specifications. In addition, whether the presence of an attribute is required is specified by a keyword, so no recurrence operators are needed. Attribute definitions are in the following form:

<!ATTLIST [Enclosing Element] 
          [Attribute Name] [type] [Modifer]
          ...
>

The first two parameters, the element name and the attribute name, are simple to define. For any element, the ATTLIST construct allows multiple attributes to be defined within the same structure. We can add this portion of the attribute definition for the attributes we are using within our XML document, creating placeholders for the rest of the definition. Best practice is to include the attribute definitions right after the element specification, again in the spirit of a DTD being as self-documenting as possible (see Example 4.10).

Example 4-10. DTD with Elements and Attribute Placeholders

<!ELEMENT JavaXML:Book (JavaXML:Title,
                        JavaXML:Contents,
                        JavaXML:Copyright)>
<!ATTLIST JavaXML:Book
      xmlns:JavaXML [type] [Modifier]
>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)|
                            (JavaXML:Chapter+, JavaXML:SectionBreak?)+)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus [type] [Modifier]
      section [type] [Modifier]
>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
      subSections [type] [Modifier]
>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright (#PCDATA)>
<!ENTITY copyright SYSTEM 
         "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">

We now need to define the types allowed for each attribute.

Attribute types

For many attributes, the value can be any textual data. This is the simplest type of attribute value, but also the least constrained. This type is signified by the keyword CDATA , representing Character Data. And yes, this is the same CDATA construct used within XML documents themselves to represent “escaped” character data. This is the type generally used when an attribute can take on any value and may represent a comment or additional information about an element. We will soon see that a better solution is to define a set of values that are allowed for an attribute to take on. In our document, the xmlns attribute should be character data. You may wonder why we need to define this as an allowed attribute. Although the xmlns is an XML keyword that signifies a namespace declaration, it is still an attribute that must be validated. Therefore, we include it to ensure our document validity. The subSections attribute of JavaXML:Topic should be character data, as well:

<!ATTLIST JavaXML:Book
      xmlns:JavaXML CDATA [Modifier]
>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)|
                            (JavaXML:Chapter+, JavaXML:SectionBreak?)+)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus [type] [Modifier]
>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
      subSections CDATA [Modifier]
>

The next type of attribute, and one of the most commonly used, is an enumeration. This type allows any of the specified values to be used, but any other value for the attribute results in an invalid document. This is useful when the set of values for an attribute can be determined at authoring time, as it tightly constrains the XML document. This is the type our focus attribute should take on, as the only allowed foci for the book are “Java” and “XML.” The allowed values are set within parenthetical notation, separated by the “or” operator, similar to the way element nestings can be specified:

<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus (XML|Java) [Modifier]
      section CDATA [Modifier]
>
<!ELEMENT JavaXML:Heading (#PCDATA)>

To be or not to be

The final question that should be answered in defining an attribute is whether the attribute is required within an element. This is specified with one of three possible keywords: #IMPLIED , #REQUIRED, or #FIXED. An implied attribute can remain unspecified. We can make this modification to the subSections attribute, as it is not required for the document to remain valid:

<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
      subSections CDATA #IMPLIED
>

For our xmlns attribute, we want to ensure that a content author always specifies the namespace for the book. Otherwise, our namespace prefixes become useless. In this case, we want to use the #REQUIRED keyword. If this attribute were not included within the JavaXML:Book element, the document would be invalid, as it doesn’t specify a required attribute:

<!ELEMENT JavaXML:Book (JavaXML:Title,
                        JavaXML:Contents,
                        JavaXML:Copyright)>
<!ATTLIST JavaXML:Book
      xmlns:JavaXML CDATA #REQUIRED
>

The final keyword, #FIXED, is not frequently used for applications. Most common in backend systems, this keyword states that the user can never change the value of this attribute. The format of this type of notation is:

<!ATTLIST [Element Name] 
          [Attribute Name] #FIXED [Fixed Value]
>

Because of its irrelevance in highly dynamic applications (an attribute whose value cannot change does not help us much in representing dynamic data!), we will not spend more time on it.

We have still not addressed the focus attribute. We have enumerated the possible values it can take on, but because the book is primarily focused on Java, we would like to allow the content author not to have to explicitly define the attribute as “Java” in chapters where that is the focus. In a book with twenty or thirty chapters, this becomes tedious. Imagine a listing of a science library’s books where each book had to notate that its primary subject was “science”! This data duplication is not very efficient, so requiring the attribute is not a great solution. However, using the #IMPLIED keyword does not result in a value being assigned to the attribute, which is precisely what we want to happen if no value is specified. What we do want is to provide a default value; if no attribute value is given, we want the default to be passed on to the XML parser. Fortunately, this is an allowed construct within XML DTDs. Instead of one of the keyword modifiers, a default value can be given. This value should be in quotes, and if an enumeration is the type for the attribute, the default must be one of the enumerated values. We can now use this to define our focus attribute:

<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus (XML|Java) "Java"
>

With this attribute definition, we have completed our DTD! Although the syntax may have seemed awkward and a bit clumsy, hopefully you were able to easily follow along and understand how elements and attributes, as well as entities, are defined within DTDs. We certainly have not thoroughly covered DTDs, as this is primarily a book on Java and XML, not just XML; however, you should feel comfortable with our sample DTD and be able to create simple DTDs for your own XML documents. Before we move on to schemas, let’s take a final look at our complete DTD in Example 4.11.

Example 4-11. Completed DTD

<!ELEMENT JavaXML:Book (JavaXML:Title,
                        JavaXML:Contents,
                        JavaXML:Copyright)>
<!ATTLIST JavaXML:Book
      xmlns:JavaXML CDATA #REQUIRED
>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)|
                            (JavaXML:Chapter+, JavaXML:SectionBreak?)+)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
      focus (XML|Java) "Java"
>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
      subSections CDATA #IMPLIED
>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright (#PCDATA)>
<!ENTITY OReillyCopyright SYSTEM 
         "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">

In comparing this XML document to its DTD, you should start to notice some unnecessary complexities in the DTD’s structure. The DTD that defines the organization of this XML file (and other XML files like it) has a structure completely unlike the XML file itself. You will also see that the DTD’s structure is different from a schema, an XSL stylesheet, and nearly every other XML-related document. Unfortunately, XML DTDs were developed as part of the XML 1.0 specification, and some design decisions made in that specification still cause XML users and developers grief. Much of the basis for XML DTDs came from the way DTDs are used in SGML, a much older specification. However, the structure of an SGML DTD is not necessarily appropriate or in the spirit of the XML specification. The result is that DTDs are not one of the best design decisions made in the formation of the XML specification. Fortunately, XML Schema looks to correct these structural differences, making constraining XML more of an XML-centric process, rather than a break from XML format. We will discuss XML Schema next. Although XML Schema is likely to replace DTDs, the process will be a slow and cautious one, as many applications have already embraced XML in production systems, and those systems use documents constrained by DTDs. For this reason, understanding DTDs is important, even if they will be phased out of heavy use.

Things Left Out

Strangely enough, there is a need for a section on things left out of a DTD. Although all of the elements within an XML document must be specified, and their attributes defined, processing instructions do not have to be part of a DTD. In fact, there is no possible way to specify the PIs and XML declaration found at the top of XML files. The DTD begins with the first occurrence of the first element within an XML file. This probably seems quite natural to you; why specify that an XML document may have this processing instruction, but not that one? The rationale behind this decision is portability.

There are some good arguments for allowing the specification of PIs within a DTD. For example, it is plausible that a content author might want to make sure his XML document is always transformed, and require an xml-stylesheet PI. But which type of stylesheet is required? Well, this can be defined too. And what type of engine should be used for transformations? Cocoon? James Clark’s Servlet? Another framework? Again, these items can be defined. However, by the time all of these details have been specified and constrained, the document has lost all its portability! It can only be used for one specific purpose on one specific framework, and can no longer be transformed iteratively and easily moved from one platform or framework or application to another. For this reason, PIs and initial XML declarations are left unconstrained within DTDs. We only have to consider the elements and attributes within the document, beginning with the root element.

Get Java and XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.