Cover | Table of Contents | Colophon
http://www.w3.org. This means
that XML is a standard. When you send XML, it conforms to this
standard; when some other application receives it, the XML still
conforms to that standard. The receiving application can count on
that. This is essentially what Java provides: any JVM knows what to
expect, and as long as code conforms to those expectations, it will
run. By using XML, you get portable data. In fact, recently you may
have heard the phrase "portable code, portable data" in
reference to the combination of Java and XML. It's a good
saying, because it turns out (as not all marketing-type slogans do)
to be true.
http://java.sun.com), make some small
modifications, and be up and running.
http://www.w3.org/TR/REC-xml. Example 2-1 shows a simple XML document that conforms to
this specification. It's a portion of the XML table of contents
for this book (I've only included part of it because it's
long!). The complete file is included with the samples for the book,
available online at http://www.oreilly.com/catalog/javaxml2 and
http://www.newInstance.com.
I'll use it to illustrate several important concepts.
<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "DTD/JavaXML.dtd">
<!-- Java and XML Contents -->
<book xmlns="http://www.oreilly.com/javaxml2"
xmlns:ora="http://www.oreilly.com"
>
<title ora:series="Java">Java and XML</title>
<!-- Chapter List -->
<contents>
<chapter title="Introduction" label="1">
<topic name="XML Matters" />
<topic name="What's Important" />
<topic name="The Essentials" />
<topic name="What's Next?" />
</chapter>
<chapter title="Nuts and Bolts" label="2">
<topic name="The Basics" />
<topic name="Constraints" />
<topic name="Transformations" />
<topic name="And More..." />
<topic name="What's Next?" />
</chapter>
<chapter title="SAX" label="3">
<topic name="Getting Prepared" />
<topic name="SAX Readers" />
<topic name="Content Handlers" />
<topic name="Gotcha!" />
<topic name="What's Next?" />
</chapter>
<chapter title="Advanced SAX" label="4">
<topic name="Properties and Features" />
<topic name="More Handlers" />
<topic name="Filters and Writers" />
<topic name="Even More Handlers" />
<topic name="Gotcha!" />
<topic name="What's Next?" />
</chapter>
<chapter title="DOM" label="5">
<topic name="The Document Object Model" />
<topic name="Serialization" />
<topic name="Mutability" />
<topic name="Gotcha!" />
<topic name="What's Next?" />
</chapter>
<!-- And so on... -->
</contents>
<ora:copyright>&OReillyCopyright;</ora:copyright>
</book><!ELEMENT book (title, contents, ora:copyright)>
<!ATTLIST book
xmlns CDATA #REQUIRED
xmlns:ora CDATA #REQUIRED
>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title
ora:series (C | Java | Linux | Oracle |
Perl | Web | Windows)
#REQUIRED
>
<!ELEMENT contents (chapter+)>
<!ELEMENT chapter (topic+)>
<!ATTLIST chapter
title CDATA #REQUIRED
number CDATA #REQUIRED
>
<!ELEMENT topic EMPTY>
<!ATTLIST topic
name CDATA #REQUIRED
>
<!-- Copyright Information -->
<!ELEMENT ora:copyright (copyright)>
<!ELEMENT copyright (year, content)>
<!ATTLIST copyright
xmlns CDATA #REQUIRED
>
<!ELEMENT year EMPTY>
<!ATTLIST year
value CDATA #REQUIRED
>
<!ELEMENT content (#PCDATA)>
<!ENTITY OReillyCopyright SYSTEM
"http://www.newInstance.com/javaxml2/copyright.xml"
>http://www.w3.org/XML and see what looks
interesting.
http://xml.apache.org, this C- and Java-based
parser is already one of the most widely contributed-to parsers
available (not that hardcore Java developers like us care about C,
though, right?). In addition, using an open source parser such as
Xerces allows you to send questions or bug reports to the
parser's authors, resulting in a better product, as well as
helping you use the software quickly and correctly. To subscribe to
the general list and request help on the Xerces parser, send a blank
email to
SAXTreeViewer
class. This class uses SAX to parse an XML document supplied on the
command line, and displays the document visually as a Swing
JTree. If you don't know anything about
Swing, don't worry; I don't focus on that, but just use
it for visual purposes. The focus will remain on SAX, and how events
within parsing can be used to perform customized action. All that
really happens is that a JTree is used, which
provides a nice simple tree model, to display the XML input document.
The key to this tree is the
DefaultMutableTreeNode
class, which you'll get quite used to in using this example, as
well as the
DefaultTreeModel
that takes care of the layout.
org.xml.sax.XMLReader interface. This interface
defines parsing behavior and allows us to set features and properties
(which I'll cover later in this chapter). For those of you
familiar with SAX 1.0,
this interface replaces the
org.xml.sax.Parser
interface.
org.xml.sax.ContentHandler
,
org.xml.sax.ErrorHandler,
org.xml.sax.DTDHandler, and
org.xml.sax.EntityResolver. In this chapter, I
will discuss ContentHandler and
ErrorHandler. I'll leave discussion of
DTDHandler and EntityResolver
for the next chapter; it is enough for now to understand that
EntityResolver
works just like the other handlers,
and is built specifically for resolving external entities specified
within an XML document. Custom application classes that perform
specific actions within the parsing process can implement each of
these interfaces. These
implementation classes can be
registered with the reader using the methods
setContentHandler(
)
, setErrorHandler( ),
setDTDHandler( ), and setEntityResolver(
). Then the reader invokes the callback methods on the
appropriate handlers during parsing.
SAXTreeViewer example, a good start is to
implement the ContentHandler interface. This
interface defines several important methods within the parsing
lifecycle that our application can react to. Since all the necessary
import statements are in place (I cheated and put them in already),
all that is needed is to code an implementation of the
ContentHandler
interface. For simplicity, I'll do this as a nonpublic class,
still within the ContentHandler interface
for handling parsing events, SAX provides an
ErrorHandler interface that can be implemented to
treat various error conditions that may arise during parsing. This
class works in the same manner as the document handler already
constructed, but defines only three callback methods. Through these
three methods, all possible error conditions are handled and reported
by SAX parsers. Here's a look at the
ErrorHandler interface:
public interface ErrorHandler {
public abstract void warning (SAXParseException exception)
throws SAXException;
public abstract void error (SAXParseException exception)
throws SAXException;
public abstract void fatalError (SAXParseException exception)
throws SAXException;
}
SAXParseException.
This object holds the line number where the trouble was encountered,
the URI of the document being treated (which could be the parsed
document or an external reference within that document), and normal
exception details such as a message and a printable stack trace. In
addition, each method can throw a SAXException.
This may seem a bit odd at first; an exception handler that throws an
exception? Keep in mind that each handler receives a parsing
exception. This can be a warning that should not cause the parsing
process to stop or an error that needs to be resolved for parsing to
continue; however, the callback may need to perform system I/O or
another operation that can throw an exception, and it needs to be
able to send any problems resulting from these actions up the
application chain. It can do this through the
SAXException the error handler callback is allowed
to throw.
org.xml.sax.helpers.ParserAdapter,
which can actually cause a SAX 1.0 Parser
implementation to behave like a SAX 2.0 XMLReader
implementation. This handy class takes in a 1.0
Parser implementation as an argument and then can
be used instead of that implementation. It allows a
ContentHandler to be set (which is a SAX 2.0
construct), and handles all namespace callbacks properly (also a
feature of SAX 2.0). The only functionality loss you will see is
that skipped entities will not be reported, as
this capability was not available in a 1.0 implementation in any
form, and cannot be emulated by a 2.0 adapter class. Example 3-3 shows this behavior in action.
try {
// Register a parser with SAX
Parser parser =
ParserFactory.makeParser(
"org.apache.xerces.parsers.SAXParser");
ParserAdapter myParser = new ParserAdapter(parser);
// Register the document handler
myParser.setContentHandler(contentHandler);
// Register the error handler
myParser.setErrorHandler(errHandler);
// Parse the document
myParser.parse(uri);
} catch (ClassNotFoundException e) {
System.out.println(
"The parser class could not be found.");
} catch (IllegalAccessException e) {
System.out.println(
"Insufficient privileges to load the parser class.");
} catch (InstantiationException e) {
System.out.println(
"The parser class could not be instantiated.");
} catch (ClassCastException e) {
System.out.println(
"The parser does not implement org.xml.sax.Parser");
} catch (IOException e) {
System.out.println("Error reaading URI: " + e.getMessage( ));
} catch (SAXException e) {
System.out.println("Error in parsing: " + e.getMessage( ));
}EntityResolver and DTDHandler
interfaces. Additionally, you'll take a look at many less used
(but still valuable) features of the Simple API for XML, as well as
the optional add-ons to SAX, such as filters and the
org.xml.sax.ext package. This should get those of
you who are using SAX in applications up, running, and even flying
past developers around you. That's always good. Keep that
editor humming, and turn the page.
EntityResolver and DTDHandler
left over from the last chapter. At that point, you should have a
comprehensive understanding of the standard SAX 2.0 distribution.
However, we'll push on to look at some SAX extensions,
beginning with the writers that can be coupled with SAX, as well as
some filtering mechanisms. Finally, I'll introduce some new
handlers to you, the LexicalHandler and
DeclHandler, and show you how they are used. When
all is said and done (including another "Gotcha!"
section), you should be ready to take on the world with just your
parser and the SAX classes. So slip into your shiny spacesuit and
grab the flightstick—ahem. Well, I got carried away with the
taking on the world. In any case, let's get down to it.
XMLReader interface, the methods for setting
document and schema validation, namespace support, and other core
features are not standard across parser implementations. To address
this, SAX 2.0 defines a
standard mechanism for setting important properties and features of a
parser that allows the addition of new properties and features as
they are accepted by the W3C without the use of proprietary
extensions or methods.
XMLReader interface, the methods for setting
document and schema validation, namespace support, and other core
features are not standard across parser implementations. To address
this, SAX 2.0 defines a
standard mechanism for setting important properties and features of a
parser that allows the addition of new properties and features as
they are accepted by the W3C without the use of proprietary
extensions or methods.
XMLReader
interface. This means you have to change little of your existing code
to request validation, set the namespace separator, and handle other
feature and property requests. The methods used for these purposes
are outlined in Table 4-1.
|
Method
|
Returns
|
Parameters
|
Syntax
|
|---|---|---|---|
setProperty( )
|
void
|
String propertyID |
ContentHandler and ErrorHandler
interfaces and briefly mentioned the
EntityResolver and DTDHandler
interfaces as well. Now that you've got a good understanding of
SAX basics, you're ready to look at these two other
handlers. You'll find
that you use EntityResolver every now and then
(more if you're writing applications to be resold), and that
the DTDHandler is something rarely ever pulled out
of your bag of tricks.
org.xml.sax.EntityResolver. This interface does
exactly what it says: resolves entities (or at least declares a
method that resolves entities, but you get the idea). The interface
defines only a single method, and it looks like this:
public InputSource resolveEntity(String publicID, String systemID)
throws SAXException, IOException;
XMLReader instance (through
setEntityResolver( ), not surprisingly). Once
that's done, every time the reader comes across an entity
reference, it passes the public ID and system ID for that entity to
the resolveEntity( ) method of your
implementation. Now you can change the normal process of entity
resolution.
resolveEntity( )
method is null, this process executes unchanged.
As a result, you should always make sure that whatever code you add
to your resolveEntity( ) implementation, it
returns null in the default case. In other words,
start with an implementation class that looks like Example 4-1.
package javaxml2;
import java.io.IOException;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class SimpleEntityResolver implements EntityResolver {
public InputSource resolveEntity(String publicID, String systemID)
throws IOException, SAXException {
// In the default case, return null
return null;
}
} http://www.megginson.com/SAX), you can add
some fairly advanced behavior to your SAX applications. This will
also get you in the mindset of using SAX as a pipeline of events,
rather than a single layer of processing. I'll explain this
concept in more detail, but suffice it to say that it really is the
key to writing efficient and modular SAX code.
org.xml.sax.XMLFilter. This class extends the
XMLReader interface, and adds two new
methods to that
class:
public void setParent(XMLReader parent); public XMLReader getParent( );
XMLReader
s through this filtering
mechanism, you can build up a processing chain, or
pipeline
,
of events. To understand what I mean by a pipeline, here's the
normal flow of a SAX
parse:
org.xml.sax.ext
package
to indicate they are extensions to SAX. However, most parsers (such
as Apache Xerces) include these two classes for use. Check your
vendor documentation, and if you don't have these classes, you
can download them from the SAX web site. I warn you that not all SAX
drivers support these extensions, so if your vendor doesn't
include them, you may want to find out why, and see if an upcoming
version of the vendor's software will support the SAX
extensions.
org.xml.sax.ext.LexicalHandler
.
This handler provides methods that can receive notification of
several lexical events such as comments, entity declarations, DTD
declarations, and CDATA sections. In
ContentHandler, these lexical events are
essentially ignored, and you just get the data and declarations
without notification of when or how they were provided.
CDATA
section or not. However, if you are working with an XML editor,
serializer, or other component that must know the exact
format of the input document, not just its
contents, the LexicalHandler can really help you
out. To see this guy in action, you first need to add an import
statement for org.xml.sax.ext.LexicalHandler to
your SAXTreeViewer.java source
file. Once that's done, you can add
LexicalHandler to the
implements clause in the nonpublic class
JTreeContentHandler in that source file:
class JTreeContentHandler implements ContentHandler, LexicalHandler {
// Callback implementations
}
JTree for visual
display of these lexical callbacks. So now you need to add
implementations for all the methods defined in
EntityResolvers,
you should always ensure that you return null as a
starting point for resolveEntity(
)
method implementations. Luckily, Java
ensures that you return something from the method, but I've
often seen code like this:
public InputSource resolveEntity(String publicID, String systemID)
throws IOException, SAXException {
InputSource inputSource = new InputSource( );
// Handle references to online version of copyright.xml
if (systemID.equals(
"http://www.newInstance.com/javaxml2/copyright.xml")) {
inputSource.setSystemId(
"file:///c:/javaxml2/ch04/xml/copyright.xml");
}
// In the default case, return null
return inputSource;
}
InputSource is created
initially and then the system ID is set on that source. The problem
here is that if no if blocks are entered, an
InputSource with no system or public ID, as well
as no specified Reader or
InputStream, is returned. This can lead to
unpredictable results; in some parsers, things continue with no
problems. In other parsers, though, returning an empty
InputSource results in entities being ignored, or
in exceptions being thrown. In other words, return
null at the end of every resolveEntity(
) implementation, and you won't have to worry about
these details.
DTDHandler
interface. In all that discussion of DTDs and validation, it's
possible you got a few things mixed up; I want to be clear that the
http://www.w3.org/TR/REC-DOM-Level-1/. Level
1 details the functionality and navigation of content within a
document. A document in the DOM is not just limited to XML, but can
be HTML or other content models as well! Level Two, which was
finalized in November of 2000, adds upon Level 1 by supplying modules
and options aimed at specific content models, such as XML, HTML, and
Cascading Style Sheets (CSS). These less-generic modules begin to
"fill in the blanks" left by the more general tools
provided in DOM Level 1. You can view the current Level 2
Recommendation at http://www.w3.org/TR/DOM-Level-2/. Level
Three is already being worked on, and should add even more facilities
for specific types of documents, such as validation handlers for XML,
and other features that I'll discuss in Chapter 6.
org.w3c.dom.Document class. Go ahead and
download this class from the book's web site, or enter in the
code as shown in Example 5-1, for the
SerializerTest
class.
package javaxml2;
import java.io.File;
import org.w3c.dom.Document;
// Parser import
import org.apache.xerces.parsers.DOMParser;
public class SerializerTest {
public void test(String xmlDocument, String outputFilename)
throws Exception {
File outputFile = new File(outputFilename);
DOMParser parser = new DOMParser( );
// Get the DOM tree as a Document object
// Serialize
}
public static void main(String[] args) {
if (args.length != 2) {
System.out.println(
"Usage: java javaxml2.SerializerTest " +
"[XML document to read] " +
"[filename to write out to]");
return;
}
try {
SerializerTest tester = new SerializerTest( );
tester.test(args[0], args[1]);
} catch (Exception e) {
e.printStackTrace( );
}
}
}