More Handlers

In the last chapter, I showed you the ContentHandler and ErrorHandler interfaces and briefly mentioned the EntityResolver and DTDHandler interfaces as well. Now that you’ve got a good understanding of SAX basics, you’re ready to look at these two other handlers.[5] You’ll find that you use EntityResolver every now and then (more if you’re writing applications to be resold), and that the DTDHandler is something rarely ever pulled out of your bag of tricks.

Using an EntityResolver

The first of these new handlers is org.xml.sax.EntityResolver. This interface does exactly what it says: resolves entities (or at least declares a method that resolves entities, but you get the idea). The interface defines only a single method, and it looks like this:

public InputSource resolveEntity(String publicID, String systemID)
    throws SAXException, IOException;

You can create an implementation of this interface, and register it with your XMLReader instance (through setEntityResolver( ), not surprisingly). Once that’s done, every time the reader comes across an entity reference, it passes the public ID and system ID for that entity to the resolveEntity( ) method of your implementation. Now you can change the normal process of entity resolution.

Typically, the XML reader resolves the entity through the specified public or system ID, whether it be a file, URL, or other resource. And if the return value from the resolveEntity( ) method is null, this process executes unchanged. As a result, you should always make sure that whatever code you add to your resolveEntity( ) implementation, it returns null in the default case. In other words, start with an implementation class that looks like Example 4-1.

Example 4-1. Simple implementation of EntityResolver

package javaxml2;

import java.io.IOException;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class SimpleEntityResolver implements EntityResolver {
    
    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
        
        // In the default case, return null
        return null;    
    }
}

You can compile this class with no problems, and register it with the reader implementation used in the SAXTreeViewer class within the buildTree( ) method:

        // Create instances needed for parsing
        XMLReader reader = 
            XMLReaderFactory.createXMLReader(vendorParserClass);
        ContentHandler jTreeContentHandler = 
            new JTreeContentHandler(treeModel, base, reader);
        ErrorHandler jTreeErrorHandler = new JTreeErrorHandler( );

        // Register content handler
        reader.setContentHandler(jTreeContentHandler);

        // Register error handler
        reader.setErrorHandler(jTreeErrorHandler);
            
        // Register entity resolver
        reader.setEntityResolver(new SimpleEntityResolver( ));

        // Other instructions and parsing...

Recompiling and rerunning the example class creates no change. Of course, that’s exactly what was predicted, so don’t be too surprised. By always returning a null value, the process of entity resolution proceeds normally. If you don’t believe that anything is happening, though, you can make this small change to echo what’s going on to the system output:

    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
            
        System.out.println("Found entity with public ID " + publicID +
            " and system ID " + systemID);
        
        // In the default case, return null
        return null;    
    }

Recompile this class and run the sample tree viewer. Once the Swing GUI comes up, move it out of the way and check out the shell or command prompt output; it should look similar to Example 4-2.

Example 4-2. Output from SAXTreeViewer with verbose output

C:\javaxml2\build>java javaxml2.SAXTreeViewer 
    c:\javaxml2\ch04\xml\contents.xml
Found entity with public ID null and 
    system ID file:///c:/javaxml2/ch04/xml/DTD/JavaXML.dtd
Found entity with public ID null and 
    system ID http://www.newInstance.com/javaxml2/copyright.xml

As always, the line breaks are purely for display purposes. In any case, you can see that both references in the XML document, for the DTD and the OReillyCopyright entity reference, are passed to the resolveEntity( ) method.

At this point, you might be scratching your head; a DTD is an entity? The term “entity” is a bit vague as it is used in EntityResolver. Perhaps a better name would have been ExternalReferenceResolver, but that wouldn’t be very fun to type. In any case, keep in mind that any external reference in your XML is going to be passed on to this method. So what’s the point, you may be asking yourself. Remember the reference for OReillyCopyright, and how it accesses an Internet URL (http://www.newInstance.com/javaxml2/copyright.xml)? What if you don’t have Internet access? What if you have a local copy you already downloaded, and want to save time by using that copy? What if you simply want to put your own copyright in place? All of these are viable questions, and real-world problems that you may have to solve in your applications. The answer, of course, is the resolveEntity( ) method I’ve been talking about.

If you return a valid InputSource (instead of null) from this method, that InputSource is used as the value for the entity reference, rather than the public or system ID specified. In other words, you can specify your own data instead of letting the reader handle resolution on its own. As an example, create a copyright.xml file on your local machine, as shown in Example 4-3.

Example 4-3. Local copy of copyright.xml

<copyright xmlns="http://www.oreilly.com">
  <year value="2001" />
  <content>This is my local version of the copyright.</content>
</copyright>

Save this in a directory you can access from your Java code (I used the same directory as my contents.xml file), and make the following change to the resolveEntity( ) method:

    public InputSource resolveEntity(String publicID, String systemID)
        throws IOException, SAXException {
         
        // Handle references to online version of copyright.xml   
        if (systemID.equals(
            "http://www.newInstance.com/javaxml2/copyright.xml")) {
            return new InputSource(
                "file:///c:/javaxml2/ch04/xml/copyright.xml");
        }
        
        // In the default case, return null
        return null;    
    }

You can see that instead of allowing resolution to the online resource, an InputSource that provides access to the local version of copyright.xml is returned. If you recompile your source file and run the tree viewer, you can visually verify that this local copy is used. Figure 4-1 shows the ora:copyright element expanded, including the local copyright document’s content.

SAXTreeViewer running with local copyrights.xml

Figure 4-1. SAXTreeViewer running with local copyrights.xml

In real-world applications, this method tends to become a lengthy laundry list of if/then/else blocks, each one handling a specific system or public ID. And this brings up an important point: try to avoid this class and method becoming a kitchen sink for IDs. If you no longer need a specific resolution to occur, remove the if clause for it. Additionally, try to use different EntityResolver implementations for different applications, rather than trying to use one generic implementation for all your applications. Doing this avoids code bloat, and more importantly, speeds up entity resolution. If you have to wait for your reader to run through fifty or a hundred String.equals( ) comparisons, you can really bog down an application. Be sure to put references accessed often at the top of the if/else stack, so they are encountered first and result in quicker entity resolution.

Finally, I want to make one more recommendation concerning your EntityResolver implementations. You’ll notice that I defined my implementation in a separate class file, while the ErrorHandler, ContentHandler, and (in the next section) DTDHandler implementations were in the same source file as parsing occurred in. That wasn’t an accident! You’ll find that the way you deal with content, errors, and DTDs is fairly static. You write your program, and that’s it. When you make changes, you’re making a larger rewrite, and need to make big changes anyway. However, you’ll make many changes to the way you want your application to resolve entities. Depending on the machine you’re on, the type of client you’re deploying to, and what and where documents are available, you’ll need different versions of an EntityResolver implementation. To allow for rapid changes to this implementation without causing editing or recompilation of your core parsing code, I use a separate source file for EntityResolver implementations; I suggest you do the same. And with that, you should know all that you need to know about resolving entities in your applications using SAX.

Using a DTDHandler

After a rather extensive look at EntityResolver, I’m going to cruise through DTDHandler pretty quickly. In two years of extensive XML programming, I’ve used this interface only once, in writing JDOM, and even then it was a rather obscure case. More often than not, you won’t work with it much unless you have lots of unparsed entities in your XML documents.

The DTDHandler interface allows you to receive notification when a reader encounters an unparsed entity or notation declaration. Of course, both of these events occur in DTDs, not XML documents, which is why this is called DTDHandler. Rather than go on and on, let me just show you what the interface looks like. It’s right here for you to check out in Example 4-4.

Example 4-4. The DTDHandler interface

package org.xml.sax;

public interface DTDHandler {

    public void notationDecl(String name, String publicID, 
                             String systemID)
        throws SAXException;

    public void unparsedEntityDecl(String name, String publicId,
                                   String systemId, String notationName)
         throws SAXException;
}

These two methods do exactly what you would expect. The first reports a notation declaration, including its name, public ID, and system ID. Remember the NOTATION structure in DTDs?

<!NOTATION jpeg SYSTEM "images/jpeg">

The second method provides information about an unparsed entity declaration, which looks as follows:

<!ENTITY stars_logo SYSTEM "http://www.nhl.com/img/team/dal38.gif"
                    NDATA jpeg>

In both cases, you can take action at these occurrences if you create an implementation of DTDHandler and register it with your reader through the XMLReader’s setDTDHandler( ) method. This is generally useful when writing low-level applications that must either reproduce XML content (such as an XML editor), or when you want to build up some Java representation of a DTD’s constraints (such as for data binding, covered in Chapter 15). In most other situations, it isn’t something you will need very often.

The DefaultHandler Class

Before finishing up with handlers (for now, at least), there’s one other important handler-related class you should know about. This class is org.xml.sax.helpers.DefaultHandler, and can be a very good friend to you SAX developers out there. Remember that so far, implementing the various handler interfaces required a class to implement ContentHandler, one to implement ErrorHandler, one to implement EntityResolver (this one is OK for all the reasons I discussed about keeping that implementation in a separate source file), and one to implement DTDHandler, if needed. Additionally, you get the joy of implementing the numerous methods in ContentHandler, even if you don’t need them all to do anything.

And here comes DefaultHandler to the rescue. This class doesn’t define any behavior of its own; however, it does implement ContentHandler, ErrorHandler, EntityResolver, and DTDHandler, and provides empty implementations of each method of each interface. So you can have a single class (call it, for example, MyHandlerClass) that extends DefaultHandler. This class only needs to override methods it needs to perform action in. You might implement startElement( ), characters( ), endElement( ), and fatalError( ), for example. In any combination of implemented methods, though, you’ll save tons of lines of code for methods you don’t need to provide action for, and make your code a lot clearer too. Then, the argument to setErrorHandler( ), setContentHandler( ), and setDTDHandler( ) would be the same instance of this MyHandlerClass. Theoretically, you could pass the instance to setEntityResolver( ) as well, although (for about the fourth time!) I discourage mixing the resolveEntity( ) method in with these other interface methods.



[5] For the picky reader, I know that technically EntityResolver isn’t a “handler,” per se. Of course, I could easily argue that the interface might be named EntityHandler, so it’s close enough for me!

Get Java and XML, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.