In the last chapter, I showed you the
ContentHandler
and ErrorHandler
interfaces and briefly mentioned the
EntityResolver
and DTDHandler
interfaces as well. Now that you’ve got a good understanding of
SAX basics, you’re ready to look at these two other
handlers.[5] You’ll find
that you use EntityResolver
every now and then
(more if you’re writing applications to be resold), and that
the DTDHandler
is something rarely ever pulled out
of your bag of tricks.
The
first of these new handlers is
org.xml.sax.EntityResolver
. This interface does
exactly what it says: resolves entities (or at least declares a
method that resolves entities, but you get the idea). The interface
defines only a single method, and it looks like this:
public InputSource resolveEntity(String publicID, String systemID) throws SAXException, IOException;
You can create an implementation of this interface, and register it
with your XMLReader
instance (through
setEntityResolver( )
, not surprisingly). Once
that’s done, every time the reader comes across an entity
reference, it passes the public ID and system ID for that entity to
the resolveEntity( )
method of your
implementation. Now you can change the normal process of entity
resolution.
Typically, the XML reader resolves the entity through the specified
public or system ID, whether it be a file, URL, or other resource.
And if the return value from the resolveEntity( )
method is null
, this process executes unchanged.
As a result, you should always make sure that whatever code you add
to your resolveEntity( )
implementation, it
returns null
in the default case. In other words,
start with an implementation class that looks like Example 4-1.
Example 4-1. Simple implementation of EntityResolver
package javaxml2; import java.io.IOException; import org.xml.sax.EntityResolver; import org.xml.sax.InputSource; import org.xml.sax.SAXException; public class SimpleEntityResolver implements EntityResolver { public InputSource resolveEntity(String publicID, String systemID) throws IOException, SAXException { // In the default case, return null return null; } }
You can compile this class with no problems, and register it with the
reader implementation used in the SAXTreeViewer
class within the buildTree( )
method:
// Create instances needed for parsing XMLReader reader = XMLReaderFactory.createXMLReader(vendorParserClass); ContentHandler jTreeContentHandler = new JTreeContentHandler(treeModel, base, reader); ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( ); // Register content handler reader.setContentHandler(jTreeContentHandler); // Register error handler reader.setErrorHandler(jTreeErrorHandler);// Register entity resolver
reader.setEntityResolver(new SimpleEntityResolver( ));
// Other instructions and parsing...
Recompiling and rerunning the example class creates no change. Of course, that’s exactly what was predicted, so don’t be too surprised. By always returning a null value, the process of entity resolution proceeds normally. If you don’t believe that anything is happening, though, you can make this small change to echo what’s going on to the system output:
public InputSource resolveEntity(String publicID, String systemID) throws IOException, SAXException {System.out.println("Found entity with public ID " + publicID +
" and system ID " + systemID);
// In the default case, return null return null; }
Recompile this class and run the sample tree viewer. Once the Swing GUI comes up, move it out of the way and check out the shell or command prompt output; it should look similar to Example 4-2.
Example 4-2. Output from SAXTreeViewer with verbose output
C:\javaxml2\build>java javaxml2.SAXTreeViewer c:\javaxml2\ch04\xml\contents.xml Found entity with public ID null and system ID file:///c:/javaxml2/ch04/xml/DTD/JavaXML.dtd Found entity with public ID null and system ID http://www.newInstance.com/javaxml2/copyright.xml
As always, the line breaks are purely for display purposes. In any
case, you can see that both references in the XML document, for the
DTD and the OReillyCopyright
entity reference, are
passed to the resolveEntity( )
method.
At this point, you might be scratching your head; a DTD is an entity?
The term “entity” is a bit vague as it is used in
EntityResolver
. Perhaps a better name would have
been ExternalReferenceResolver
, but that
wouldn’t be very fun to type. In any case, keep in mind that
any external reference in your XML is going to be passed on to this
method. So what’s the point, you may be asking yourself.
Remember the reference for OReillyCopyright
, and
how it accesses an Internet URL (http://www.newInstance.com/javaxml2/copyright.xml)?
What if you don’t have Internet access? What if you have a
local copy you already downloaded, and want to save time by using
that copy? What if you simply want to put your own copyright in
place? All of these are viable questions, and real-world problems
that you may have to solve in your applications. The answer, of
course, is the resolveEntity( )
method I’ve
been talking about.
If you return a valid InputSource
(instead of
null
) from this method, that
InputSource
is used as the value for the entity
reference, rather than the public or system ID specified. In other
words, you can specify your own data instead of letting the reader
handle resolution on its own. As an example, create a copyright.xml file on your local machine, as
shown in Example 4-3.
Example 4-3. Local copy of copyright.xml
<copyright xmlns="http://www.oreilly.com"> <year value="2001" /> <content>This is my local version of the copyright.</content> </copyright>
Save this in a directory you can access from your Java code (I used
the same directory as my contents.xml file), and make the following
change to the resolveEntity( )
method:
public InputSource resolveEntity(String publicID, String systemID) throws IOException, SAXException {// Handle references to online version of copyright.xml
if (systemID.equals(
"http://www.newInstance.com/javaxml2/copyright.xml")) {
return new InputSource(
"file:///c:/javaxml2/ch04/xml/copyright.xml");
}
// In the default case, return null return null; }
You can see that instead of allowing resolution to the online
resource, an InputSource
that provides access to
the local version of copyright.xml is returned. If you recompile
your source file and run the tree viewer, you can visually verify
that this local copy is used. Figure 4-1 shows the
ora:copyright
element expanded, including the
local copyright document’s content.
In real-world applications, this method tends to become a lengthy
laundry list of
if
/then
/else
blocks, each one handling a specific system or public ID. And this
brings up an important point: try to avoid this class and method
becoming a kitchen sink for IDs. If you no longer need a specific
resolution to occur, remove the if
clause for it.
Additionally, try to use different EntityResolver
implementations for different applications, rather than trying to use
one generic implementation for all your applications. Doing this
avoids code bloat, and more importantly, speeds up entity resolution.
If you have to wait for your reader to run through fifty or a hundred
String.equals( )
comparisons, you can really bog
down an application. Be sure to put references accessed often at the
top of the if
/else
stack, so
they are encountered first and result in quicker entity resolution.
Finally, I want to make one more recommendation concerning your
EntityResolver
implementations. You’ll
notice that I defined my implementation in a separate class file,
while the ErrorHandler
,
ContentHandler
, and (in the next section)
DTDHandler
implementations were in the same source
file as parsing occurred in. That wasn’t an accident!
You’ll find that the way you deal with content, errors, and
DTDs is fairly static. You write your program, and that’s it.
When you make changes, you’re making a larger rewrite, and need
to make big changes anyway. However, you’ll make many changes
to the way you want your application to resolve entities. Depending
on the machine you’re on, the type of client you’re
deploying to, and what and where documents are available,
you’ll need different versions of an
EntityResolver
implementation. To allow for rapid
changes to this implementation without causing editing or
recompilation of your core parsing code, I use a separate source file
for EntityResolver
implementations; I suggest you
do the same. And with that, you should know all that you need to know
about resolving entities in your applications using SAX.
After
a rather extensive look at EntityResolver
,
I’m going to cruise through DTDHandler
pretty quickly. In two years of extensive XML programming, I’ve
used this interface only once, in writing JDOM, and even then it was
a rather obscure case. More often than not, you won’t work with
it much unless you have lots of unparsed entities in your XML
documents.
The DTDHandler
interface allows you to receive
notification when a reader encounters an unparsed entity or notation
declaration. Of course, both of these events occur in DTDs, not XML
documents, which is why this is called DTDHandler
.
Rather than go on and on, let me just show you what the interface
looks like. It’s right here for you to check out in Example 4-4.
Example 4-4. The DTDHandler interface
package org.xml.sax; public interface DTDHandler { public void notationDecl(String name, String publicID, String systemID) throws SAXException; public void unparsedEntityDecl(String name, String publicId, String systemId, String notationName) throws SAXException; }
These two methods do exactly what you would expect. The first reports
a notation declaration, including its name, public ID, and system ID.
Remember the NOTATION
structure in DTDs?
<!NOTATION jpeg SYSTEM "images/jpeg">
The second method provides information about an unparsed entity declaration, which looks as follows:
<!ENTITY stars_logo SYSTEM "http://www.nhl.com/img/team/dal38.gif" NDATA jpeg>
In both cases, you can take action at these occurrences if you create
an implementation of DTDHandler
and register it
with your reader through the XMLReader
’s
setDTDHandler( )
method. This is generally useful
when writing low-level applications that must either reproduce XML
content (such as an XML editor), or when you want to build up some
Java representation of a DTD’s constraints (such as for data
binding, covered in Chapter 15). In most other
situations, it isn’t something you will need very often.
Before
finishing up with handlers (for now, at least), there’s one
other important handler-related class you should know about. This
class is org.xml.sax.helpers.DefaultHandler
, and
can be a very good friend to you SAX developers out there. Remember
that so far, implementing the various handler interfaces required a
class to implement ContentHandler
, one to
implement ErrorHandler
, one to implement
EntityResolver
(this one is OK for all the reasons
I discussed about keeping that implementation in a separate source
file), and one to implement DTDHandler
, if needed.
Additionally, you get the joy of implementing the numerous methods in
ContentHandler
, even if you don’t need them
all to do anything.
And here comes DefaultHandler
to the rescue. This
class doesn’t define any behavior of its own; however, it does
implement ContentHandler
,
ErrorHandler
, EntityResolver
,
and DTDHandler
, and provides empty implementations
of each method of each interface. So you can have a single class
(call it, for example, MyHandlerClass
) that
extends DefaultHandler
. This class only needs to
override methods it needs to perform action in. You might implement
startElement( )
, characters( )
,
endElement( )
, and fatalError( )
, for example. In any combination of implemented methods,
though, you’ll save tons of lines of code for methods you
don’t need to provide action for, and make your code a lot
clearer too. Then, the argument to setErrorHandler( )
, setContentHandler( )
, and
setDTDHandler( )
would be the same instance of
this MyHandlerClass
. Theoretically, you could pass
the instance to setEntityResolver( )
as well,
although (for about the fourth time!) I discourage mixing the
resolveEntity( )
method in with these other
interface methods.
[5] For the picky reader, I know that
technically EntityResolver
isn’t a
“handler,” per se. Of course, I could easily argue that
the interface might be named EntityHandler
, so
it’s close enough for me!
Get Java and XML, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.