As we have just discussed, an XML document is not very usable without an accompanying DTD. Just as XML can effectively describe data, the DTD makes this data usable in a variety of ways by many different programs by defining the structure of the data. In this section, we will look at the constructs for a DTD. We will again use as an example the XML representation of a portion of the table of contents for this book, and we will go through the process of constructing a DTD for the XML table of contents document.
The DTD’s job is to define how data must be formatted. It must define each allowed element in an XML document, the allowed attributes, and possibly the acceptable attribute values for each element, the nesting and occurrences of each element, and any external entities. In fact, DTDs can specify quite a few other things about an XML document, but these basics are what we will focus on. We will learn the constructs that a DTD offers by applying them to and constraining our example XML file from Chapter 2. Because we will be referring to that file often throughout this chapter, it is reprinted here in Example 4.3.
Example 4-3. Table of Contents XML File
<?xml version="1.0"?> <?xml-stylesheet href="XSL\JavaXML.html.xsl" type="text/xsl"?> <?xml-stylesheet href="XSL\JavaXML.wml.xsl" type="text/xsl" media="wap"?> <?cocoon-process type="xslt"?> <!DOCTYPE JavaXML:Book SYSTEM "DTD\JavaXML.dtd"> <!-- Java and XML --> <JavaXML:Book xmlns:JavaXML="http://www.oreilly.com/catalog/javaxml/"> <JavaXML:Title>Java and XML</JavaXML:Title> <JavaXML:Contents> <JavaXML:Chapter focus="XML"> <JavaXML:Heading>Introduction</JavaXML:Heading> <JavaXML:Topic subSections="7">What Is It?</JavaXML:Topic> <JavaXML:Topic subSections="3">How Do I Use It?</JavaXML:Topic> <JavaXML:Topic subSections="4">Why Should I Use It?</JavaXML:Topic> <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic> </JavaXML:Chapter> <JavaXML:Chapter focus="XML"> <JavaXML:Heading>Creating XML</JavaXML:Heading> <JavaXML:Topic subSections="0">An XML Document</JavaXML:Topic> <JavaXML:Topic subSections="2">The Header</JavaXML:Topic> <JavaXML:Topic subSections="6">The Content</JavaXML:Topic> <JavaXML:Topic subSections="1">What's Next?</JavaXML:Topic> </JavaXML:Chapter> <JavaXML:Chapter focus="Java"> <JavaXML:Heading>Parsing XML</JavaXML:Heading> <JavaXML:Topic subSections="3">Getting Prepared</JavaXML:Topic> <JavaXML:Topic subSections="3">SAX Readers</JavaXML:Topic> <JavaXML:Topic subSections="9">Content Handlers</JavaXML:Topic> <JavaXML:Topic subSections="4">Error Handlers</JavaXML:Topic> <JavaXML:Topic subSections="0"> A Better Way to Load a Parser </JavaXML:Topic> <JavaXML:Topic subSections="4">"Gotcha!"</JavaXML:Topic> <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic> </JavaXML:Chapter> <JavaXML:SectionBreak/> <JavaXML:Chapter focus="Java"> <JavaXML:Heading>Web Publishing Frameworks</JavaXML:Heading> <JavaXML:Topic subSections="4">Selecting a Framework</JavaXML:Topic> <JavaXML:Topic subSections="4">Installation</JavaXML:Topic> <JavaXML:Topic subSections="3"> Using a Publishing Framework </JavaXML:Topic> <JavaXML:Topic subSections="2">XSP</JavaXML:Topic> <JavaXML:Topic subSections="3">Cocoon 2.0 and Beyond</JavaXML:Topic> <JavaXML:Topic subSections="0">What's Next?</JavaXML:Topic> </JavaXML:Chapter> </JavaXML:Contents> <JavaXML:Copyright>&OReillyCopyright;</JavaXML:Copyright> </JavaXML:Book>
Our first concern is specifying which
elements are allowed within the document. We want content authors
using this DTD to be able to use elements such as
JavaXML:Book
and
JavaXML:Contents
, but not to be able to use
elements like JavaXML:foo
and
JavaXML:bar
. When we decide on a set of allowed
elements, we begin to give a semantic meaning to our XML document; in
other words, we give it a context in which it is useful. First, then,
we want to make a list of all allowed elements. The easiest way to
make this list is to scan our XML document and make a note of each
element being used. It also is a good idea to define the purpose of
each tag. Although this is not something defined in the DTD unless by
a comment (not a bad idea!), it helps you, the DTD author, keep
things straight. Table 4.1 has a complete listing
of the elements in the contents.xml
document.
Table 4-1. Elements Allowed for Our XML Document
With these elements defined, we can now specify each one in our DTD. This is done with the following notation:
<!ELEMENT [Element Name] [Element Definition/Type]>
The [Element
Name]
is the
actual element from our table. This name, as in the table, should
include the namespace prefix. Within the DTD, there is no idea of an
element with a namespace prefix, and then a mapping from a namespace
URI to that prefix. Within a DTD, the element name is either the name
itself, when no namespace is used, or the namespace prefix and
element name separated by a colon.
The [Element
Definition/Type]
is the most useful portion of the DTD. It allows the data within the
element to be defined, giving a “type” to the element,
whether it is pure data or a compound type consisting of data and
other elements. The most unrestrictive element type is the keyword
ANY
. Using this keyword allows the element
to contain textual data, nested elements, or any legal XML
combination of the two. Thus, we can now define all the elements in
our XML document within our DTD, albeit not in a very useful way.
Example 4.4 shows the beginning of a DTD for our XML
document.
Example 4-4. A “Bare-Bones” DTD with Element Definitions
<!ELEMENT JavaXML:Book ANY> <!ELEMENT JavaXML:Title ANY> <!ELEMENT JavaXML:Contents ANY> <!ELEMENT JavaXML:Chapter ANY> <!ELEMENT JavaXML:Heading ANY> <!ELEMENT JavaXML:Topic ANY> <!ELEMENT JavaXML:SectionBreak ANY> <!ELEMENT JavaXML:Copyright ANY>
Of course, this simple DTD, in addition to not handling either attributes or entity references, doesn’t help us much. Although it defines each allowed element, it says nothing about the types of those elements, or the nesting allowed. It would still be simple to create a nonsensical XML document that conformed to this DTD, as in Example 4.5.
Example 4-5. A Conformant XML Document That Is Useless
<?xml version="1.0"?> <?xml-stylesheet href="XSL\JavaXML.html.xsl" type="text/xsl"?> <?xml-stylesheet href="XSL\JavaXML.wml.xsl" type="text/xsl" media="wap"?> <?cocoon-process type="xslt"?> <!DOCTYPE JavaXML:Book SYSTEM "DTD\JavaXML.dtd"> <JavaXML:Topic> <JavaXML:Book>Here's my Book</JavaXML:Book> <JavaXML:Copyright> <JavaXML:Chapter>Chapter One</JavaXML:Chapter> </JavaXML:Copyright> <JavaXML:SectionBreak>Here's a Section</JavaXML:SectionBreak> </JavaXML:Topic>
Although this document fragment uses only elements allowed by the DTD, its structure is incorrect. This is because the DTD gives no information about how elements are nested and which elements can contain textual data.
One of the keys to XML document structure is the nesting of tags. We can expand on our original table of elements by adding the elements that can be nested within each structure. This will create our element hierarchy for us, which we can then define within our DTD. Table 4.2 summarizes the element hierarchy.
Table 4-2. Element Hierarchy
With this table complete, we are now ready to define the allowed element nestings within our DTD. The way to perform this is:
<!ELEMENT [Element Name] ([Nested Element][,Nested Element]...)>
In this case, a list of comma-separated elements within parentheses becomes the element type. The order of the elements is also important; this ordering is enforced as a validity constraint within the XML document. This adds additional constraints to our document, ensuring that a copyright element always comes at the end of a book, or that a title element appears before content elements. With this new notation, we can update our DTD to add the allowed nestings of elements, shown in Example 4.6.
Example 4-6. DTD with Element Hierarchy
<!ELEMENT JavaXML:Book (JavaXML:Title, JavaXML:Contents, JavaXML:Copyright)> <!ELEMENT JavaXML:Title ANY> <!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)> <!ELEMENT JavaXML:Heading ANY> <!ELEMENT JavaXML:Topic ANY> <!ELEMENT JavaXML:SectionBreak ANY> <!ELEMENT JavaXML:Copyright ANY>
Although some elements, those that contain parsed data, are not changed, we have a hierarchy of elements that adds a lot of meaning to our XML document constraints. The earlier example that made no sense because of element ordering and nesting would now be invalid. However, there are still a lot of problems with allowing any type of data within the remaining elements.
The
element type to use for textual data is
#PCDATA
. This keyword represents Parsed
Character Data, and can be used for elements that contain character
data that we want our XML parser to handle normally. Using the
#PCDATA
keyword limits the element to using only
character data, though; nested elements are not allowed. We will
discuss situations like this a little later. For now, we can modify
our title, heading, and topic elements to reflect that textual data
should be used within these elements, as in Example 4.7.
Example 4-7. DTD with Element Hierarchy and Character Data Elements
<!ELEMENT JavaXML:Book (JavaXML:Title, JavaXML:Contents, JavaXML:Copyright)> <!ELEMENT JavaXML:Title (#PCDATA)> <!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)> <!ELEMENT JavaXML:Heading (#PCDATA)> <!ELEMENT JavaXML:Topic (#PCDATA)> <!ELEMENT JavaXML:SectionBreak ANY> <!ELEMENT JavaXML:Copyright ANY>
We
are moving right along in our element definitions within DTDs. In
addition to elements that contain textual data and elements that
contain other elements, we have one element,
JavaXML:SectionBreak
, which should contain no
data. In other words, the element should always be empty. Although it
would be legal to specify that this element contained parsed
character data and simply never insert any, this isn’t a good
use of our constraints. It is better to actually require that the
element always be empty, preventing accidental misuse. The keyword
EMPTY
allows this constraint. This keyword
does not need to appear within parentheses, as it denotes a type and
cannot be grouped with any other elements, which, as we will soon
see, the parentheses allow. We can update our section break element
in our DTD now in Example 4.8.
Example 4-8. DTD with EMPTY Element Defined
<!ELEMENT JavaXML:Book (JavaXML:Title,
JavaXML:Contents,
JavaXML:Copyright)>
<!ELEMENT JavaXML:Title (#PCDATA)>
<!ELEMENT JavaXML:Contents (JavaXML:Chapter, JavaXML:SectionBreak)>
<!ELEMENT JavaXML:Chapter (JavaXML:Heading, JavaXML:Topic)>
<!ELEMENT JavaXML:Heading (#PCDATA)>
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ELEMENT JavaXML:SectionBreak EMPTY>
<!ELEMENT JavaXML:Copyright ANY>
The last element we have to define more rigidly is the
JavaXML:Copyright
element. As you recall, this is
actually a container for an entity reference to another file that
should be included. When our XML sees
&OReillyCopyright;
, it will attempt to look up
the OReillyCopyright
entity within the DTD, which
in our case should reference an external file. This external file
should have a shared copyright for all books being documented in XML.
The DTD has the job of specifying where the external file is located,
and how it should be accessed. In our case, we assume that the
copyright file is on the local filesystem, and we want to reference
that file. Entity references are specified in DTDs with the notation:
<!ENTITY [Entity Name] "[Replacement Characters/Identifier]">
You will notice that the notation indicated that a set of replacement characters could be specified, allowing substitution similar to using an external file. In fact, this is how the “escape” characters within XML are handled:
<!ENTITY & "&"> <!ENTITY < "<"> <!ENTITY > ">"> ...
So if our copyright was a very short piece of text, we could use something like:
<!ENTITY &OReillyCopyright; "Copyright O'Reilly and Associates, 2000">
However, the copyright we expect to use is a longer piece of text, more appropriately stored in an external file for easy modification. This also allows it to be used in multiple XML documents without duplication of the data within each document’s DTD. This requires us to specify a system-level resource as the resolution for the entity reference. The notation for this type of reference is:
<!ENTITY [Entity Reference] SYSTEM "[URI]">
As in the case of parsing our XML document and our discussion on namespaces, the URI specified can be either a local resource or a network-accessible resource. In our case, we want to use a file located on an external server, so the entity would reference that file through a URL:
<!ENTITY OReillyCopyright SYSTEM "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">
With this reference set up, an XML parser could now handle the
OReillyCopyright
reference within an XML document
and properly resolve it within the parsing process. This section of
the XML had to be commented out in Chapter 3, for
this very reason, and in the next chapter, we will uncomment the
reference and see how a validating parser handles the entity and uses
a DTD to resolve it.
Finally, we need to let our containing element know it should expect parsed character data:
<!ELEMENT JavaXML:Copyright (#PCDATA)>
The last major construct in DTD element specifications we will look at is the variety of combinations of grouping, multiple occurrences, and choices within an element. In other words, the case where element X can appear once, or element Y can occur, followed by element Z. These structures are critical to DTDs; by default, an element can appear exactly once when specified without any modifiers in the DTD:
<!ELEMENT MyElement (NestedElement, AnotherElement)>
Here NestedElement
must appear exactly once, and
must always be followed by exactly one
AnotherElement
. If this were not the structure of
the corresponding XML document, the document would be invalid. A
special set of modifiers must be applied to elements to change this
default constraining behavior.
The most common modifier applied to an element is a recurrence operator. These operators allow an element to appear zero or more times, one or more times, or optionally not at all, in addition to the default, which requires an element to appear exactly one time. Table 4.3 lists each of the recurrence operators and what recurrence they indicate.
Each operator can be appended to the end of an element name. In our
previous example, to allow NestedElement
to appear
one or more times, and then require that
AnotherElement
appear either once or not at all,
we would use the following within the DTD:
<!ELEMENT MyElement (NestedElement+, AnotherElement?)>
This would make the following XML perfectly valid:
<MyElement> <NestedElement>One</NestedElement> <NestedElement>Two</NestedElement> </MyElement>
In the DTD we have been building, we have a similar situation within
the JavaXML:Chapter
element. We would like to
allow a chapter heading (JavaXML:Heading
) to
either appear once, or optionally be omitted, and to allow one or
more JavaXML:Topic
elements to appear. We can now
make this change using our recurrence operators:
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
This easy change makes our XML chapter representation much more
realistic. We also need to make a change to the
JavaXML:Contents
element definition. A chapter or
set of chapters should appear, and then possibly a section break. The
section break must be optional, as a book may only contain chapters.
We can define the recurrence of chapters and the section break
elements like this:
<!ELEMENT JavaXML:Contents (JavaXML:Chapter+,JavaXML:SectionBreak?)>
However, we still have not let the DTD know that more chapters can
appear after the JavaXML:SectionBreak
element. In
fact, if we look at the structure of the XML we would like to allow
this structure to occur multiple times. Chapters followed by a
section break can be followed by more chapters followed by another
section break! We need a concept of grouping within our element.
Grouping
allows us to solve problems like the element nesting within
JavaXML:Contents
. Often, recurrence occurs for a
block or group of elements, rather than a single element. For this
reason, any of the recurrence operators can be applied to a group of
elements. Enclosing a set of elements within
parentheses signifies the group. If you
are starting to remember your old LISP classes in college,
don’t worry; it stays fairly simple in our examples, and the
parentheses don’t get out of hand. Nested parentheses are, of
course, acceptable. So to group a set of elements the following
notation would be used:
<!ELEMENT GroupingExample ((Group1El1, Group1El2), (Group2El1, Group2El2))>
An operator can then be applied to the group, rather than to a single element. In the scenario we are currently looking at, we need to apply the operator allowing multiple occurrences to the group containing our chapter and section break element. This would then allow repetition of the entire construct:
<!ELEMENT JavaXML:Contents (JavaXML:Chapter+,JavaXML:SectionBreak?)+>
This now accurately allows the various combinations: a set of chapters followed by one section break, and then the structure repeating multiple times or optionally not repeating at all. It also allows the case where only chapters are included, without any section breaks. However, this is not particularly clear from the DTD. What would be better is to specify that one or more chapters could occur, or this structure could occur. Although this is not going to result in different behavior, it certainly would make more sense to readers other than the DTD author. To accomplish this, though, we need to introduce an “or” function.
DTDs do conveniently offer an “or” function, signified by the pipe operator. This allows one thing or the other to occur, and the pipe is often used in conjunction with groupings. One common, although not necessarily good, use of the “or” operator is to allow a certain element or elements to appear within an enclosing element, or for textual data to appear:
<!ELEMENT AggregateElement (#PCDATA|(Element1, Element2))>
For this DTD, both of the following XML document fragments would be valid:
<AggregateElement> <Element1>One</Element1> <Element2>Two</Element2> </AggregateElement> <AggregateElement> Textual Data </AggregateElement>
Using this type of constraint is discouraged, though, as the meaning of the enclosing element becomes obscure. An element should typically include textual, parsed data, or other elements; it should not allow both.
In our document, we want to show a clearer representation of our
JavaXML:Contents
element. We can now do that:
<!ELEMENT JavaXML:Contents ((JavaXML:Chapter+) | (JavaXML:Chapter+,JavaXML:SectionBreak?)+)>
It is now clear that either multiple chapters may appear, or that chapters followed by a section break may appear. This greatly adds to the documentation that our DTD provides, as well as maintaining the proper constraints.
We have now completely specified and constrained our XML elements. The DTD shown in Example 4.9 should function in regard to our elements, and only attribute definitions are left, which we will look at next.
Example 4-9. DTD with Elements Specified
<!ELEMENT JavaXML:Book (JavaXML:Title, JavaXML:Contents, JavaXML:Copyright)> <!ELEMENT JavaXML:Title (#PCDATA)> <!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)| (JavaXML:Chapter+, JavaXML:SectionBreak?)+)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)> <!ELEMENT JavaXML:Heading (#PCDATA)> <!ELEMENT JavaXML:Topic (#PCDATA)> <!ELEMENT JavaXML:SectionBreak EMPTY> <!ELEMENT JavaXML:Copyright (#PCDATA)> <!ENTITY OReillyCopyright SYSTEM "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">
With element specifications thoroughly covered, we can move on to defining attributes. Because there are not complicated nesting scenarios with attributes, defining them is somewhat simpler than dealing with element specifications. In addition, whether the presence of an attribute is required is specified by a keyword, so no recurrence operators are needed. Attribute definitions are in the following form:
<!ATTLIST [Enclosing Element] [Attribute Name] [type] [Modifer] ... >
The first two parameters, the element name and the attribute name,
are simple to define. For any element, the
ATTLIST
construct allows multiple attributes
to be defined within the same structure. We can add this portion of
the attribute definition for the attributes we are using within our
XML document, creating placeholders for the rest of the definition.
Best practice is to include the attribute definitions right after the
element specification, again in the spirit of a DTD being as
self-documenting as possible (see Example 4.10).
Example 4-10. DTD with Elements and Attribute Placeholders
<!ELEMENT JavaXML:Book (JavaXML:Title, JavaXML:Contents, JavaXML:Copyright)> <!ATTLIST JavaXML:Book xmlns:JavaXML [type] [Modifier] > <!ELEMENT JavaXML:Title (#PCDATA)> <!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)| (JavaXML:Chapter+, JavaXML:SectionBreak?)+)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)> <!ATTLIST JavaXML:Chapter focus [type] [Modifier] section [type] [Modifier] > <!ELEMENT JavaXML:Heading (#PCDATA)> <!ELEMENT JavaXML:Topic (#PCDATA)> <!ATTLIST JavaXML:Topic subSections [type] [Modifier] > <!ELEMENT JavaXML:SectionBreak EMPTY> <!ELEMENT JavaXML:Copyright (#PCDATA)> <!ENTITY copyright SYSTEM "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">
We now need to define the types allowed for each attribute.
For many attributes, the value can be any
textual data. This is the simplest type of attribute value, but also
the least constrained. This type is signified by the keyword
CDATA
, representing
Character Data. And yes,
this is the same CDATA
construct used within XML
documents themselves to represent “escaped” character
data. This is the type generally used when an attribute can take on
any value and may represent a comment or additional information about
an element. We will soon see that a better solution is to define a
set of values that are allowed for an attribute to take on. In our
document, the xmlns
attribute should be character data.
You may wonder why we need to define this as an allowed attribute.
Although the xmlns
is an XML keyword that
signifies a namespace declaration, it is still an attribute that must
be validated. Therefore, we include it to ensure our document
validity. The subSections
attribute of
JavaXML:Topic
should be character data, as well:
<!ATTLIST JavaXML:Bookxmlns:JavaXML CDATA [Modifier]
> <!ELEMENT JavaXML:Title (#PCDATA)> <!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)| (JavaXML:Chapter+, JavaXML:SectionBreak?)+)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)> <!ATTLIST JavaXML:Chapter focus [type] [Modifier] > <!ELEMENT JavaXML:Heading (#PCDATA)> <!ELEMENT JavaXML:Topic (#PCDATA)> <!ATTLIST JavaXML:TopicsubSections CDATA [Modifier]
>
The next type of attribute, and one of the most commonly used, is an
enumeration.
This type allows any of the specified values to be used, but any
other value for the attribute results in an invalid document. This is
useful when the set of values for an attribute can be determined at
authoring time, as it tightly constrains the XML document. This is
the type our focus
attribute should take on, as
the only allowed foci for the book are “Java” and
“XML.” The allowed values are set within parenthetical
notation, separated by the “or” operator, similar to the
way element nestings can be specified:
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
focus (XML|Java) [Modifier]
section CDATA [Modifier]
>
<!ELEMENT JavaXML:Heading (#PCDATA)>
The final question that should be answered in defining an attribute
is whether the attribute is required within an element. This is
specified with one of three possible keywords:
#IMPLIED
,
#REQUIRED
, or
#FIXED
. An implied
attribute can remain unspecified. We can make this modification to
the subSections
attribute, as it is not required
for the document to remain valid:
<!ELEMENT JavaXML:Topic (#PCDATA)>
<!ATTLIST JavaXML:Topic
subSections CDATA #IMPLIED
>
For our xmlns
attribute, we want to ensure that a
content author always specifies the namespace for the book.
Otherwise, our namespace prefixes become useless. In this case, we
want to use the #REQUIRED
keyword. If this
attribute were not included within the
JavaXML:Book
element, the document would be
invalid, as it doesn’t specify a required attribute:
<!ELEMENT JavaXML:Book (JavaXML:Title,
JavaXML:Contents,
JavaXML:Copyright)>
<!ATTLIST JavaXML:Book
xmlns:JavaXML CDATA #REQUIRED
>
The final keyword, #FIXED
, is not frequently used
for applications. Most common in backend systems, this keyword states
that the user can never change the value of this attribute. The
format of this type of notation is:
<!ATTLIST [Element Name] [Attribute Name] #FIXED [Fixed Value] >
Because of its irrelevance in highly dynamic applications (an attribute whose value cannot change does not help us much in representing dynamic data!), we will not spend more time on it.
We have still not addressed the
focus
attribute. We have enumerated the
possible values it can take on, but because the book is primarily
focused on Java, we would like to allow the content author not to
have to explicitly define the attribute as “Java” in
chapters where that is the focus. In a book with twenty or thirty
chapters, this becomes tedious. Imagine a listing of a science
library’s books where each book had to notate that its primary
subject was “science”! This data duplication is not very
efficient, so requiring the attribute is not a great solution.
However, using the #IMPLIED
keyword does not
result in a value being assigned to the attribute, which is precisely
what we want to happen if no value is specified. What we do want is
to provide a default value; if no attribute value is given, we want
the default to be passed on to the XML parser. Fortunately, this is
an allowed construct within XML DTDs. Instead of one of the keyword
modifiers, a default value can be given. This value should be in
quotes, and if an enumeration is the type for the attribute, the
default must be one of the enumerated values. We can now use this to
define our focus
attribute:
<!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)>
<!ATTLIST JavaXML:Chapter
focus (XML|Java) "Java"
>
With this attribute definition, we have completed our DTD! Although the syntax may have seemed awkward and a bit clumsy, hopefully you were able to easily follow along and understand how elements and attributes, as well as entities, are defined within DTDs. We certainly have not thoroughly covered DTDs, as this is primarily a book on Java and XML, not just XML; however, you should feel comfortable with our sample DTD and be able to create simple DTDs for your own XML documents. Before we move on to schemas, let’s take a final look at our complete DTD in Example 4.11.
Example 4-11. Completed DTD
<!ELEMENT JavaXML:Book (JavaXML:Title, JavaXML:Contents, JavaXML:Copyright)> <!ATTLIST JavaXML:Book xmlns:JavaXML CDATA #REQUIRED > <!ELEMENT JavaXML:Title (#PCDATA)> <!ELEMENT JavaXML:Contents ((JavaXML:Chapter+)| (JavaXML:Chapter+, JavaXML:SectionBreak?)+)> <!ELEMENT JavaXML:Chapter (JavaXML:Heading?,JavaXML:Topic+)> <!ATTLIST JavaXML:Chapter focus (XML|Java) "Java" > <!ELEMENT JavaXML:Heading (#PCDATA)> <!ELEMENT JavaXML:Topic (#PCDATA)> <!ATTLIST JavaXML:Topic subSections CDATA #IMPLIED > <!ELEMENT JavaXML:SectionBreak EMPTY> <!ELEMENT JavaXML:Copyright (#PCDATA)> <!ENTITY OReillyCopyright SYSTEM "http://www.oreilly.com/catalog/javaxml/docs/copyright.xml">
In comparing this XML document to its DTD, you should start to notice some unnecessary complexities in the DTD’s structure. The DTD that defines the organization of this XML file (and other XML files like it) has a structure completely unlike the XML file itself. You will also see that the DTD’s structure is different from a schema, an XSL stylesheet, and nearly every other XML-related document. Unfortunately, XML DTDs were developed as part of the XML 1.0 specification, and some design decisions made in that specification still cause XML users and developers grief. Much of the basis for XML DTDs came from the way DTDs are used in SGML, a much older specification. However, the structure of an SGML DTD is not necessarily appropriate or in the spirit of the XML specification. The result is that DTDs are not one of the best design decisions made in the formation of the XML specification. Fortunately, XML Schema looks to correct these structural differences, making constraining XML more of an XML-centric process, rather than a break from XML format. We will discuss XML Schema next. Although XML Schema is likely to replace DTDs, the process will be a slow and cautious one, as many applications have already embraced XML in production systems, and those systems use documents constrained by DTDs. For this reason, understanding DTDs is important, even if they will be phased out of heavy use.
Strangely enough, there is a need for a section on things left out of a DTD. Although all of the elements within an XML document must be specified, and their attributes defined, processing instructions do not have to be part of a DTD. In fact, there is no possible way to specify the PIs and XML declaration found at the top of XML files. The DTD begins with the first occurrence of the first element within an XML file. This probably seems quite natural to you; why specify that an XML document may have this processing instruction, but not that one? The rationale behind this decision is portability.
There are some good arguments for allowing the specification of PIs
within a DTD. For example, it is plausible that a content author
might want to make sure his XML document is always transformed, and
require an xml-stylesheet
PI. But which type of
stylesheet is required? Well, this can be defined too. And what type
of engine should be used for transformations? Cocoon? James
Clark’s Servlet? Another framework? Again, these items can be
defined. However, by the time all of these details have been
specified and constrained, the document has lost all its
portability! It can only be used for one
specific purpose on one specific framework, and can no longer be
transformed iteratively and easily moved from one platform or
framework or application to another. For this reason, PIs and initial
XML declarations are left unconstrained within DTDs. We only have to
consider the elements and attributes within the document, beginning
with the root element.
Get Java and XML now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.