Creating the XML

The first step is to create XML data to be unmarshalled into Java. You’ll find that you spend as much time creating XML documents as you do in any other aspect of data binding, as it provides the data for your application. Additionally, it’s often easier to open up an editor like notepad or vi than it is to code a program to populate Java objects and then marshal them (although I’ll talk about that approach in the next chapter, which focuses on marshalling Java to XML). So let’s talk XML.

Authoring an Instance Document

I’ve spent a lot of time talking about constraint models, setting up your data structure, and other conceptual type ideas. In this section, you get to move a little closer to the practical. Once you’ve got your constraint model set up (as shown in Chapter 3), you need to model your actual data. In this case, the modeling part of that task is done, and all that is left is filling a document with data. With the emerging XML editor scene, this becomes a piece of cake. For example, Figure 4-3 shows a screenshot of XML Spy, which allows a simple filling of constraints with data; as you can see, this is a trivial task.

Editing XML with XML Spy

Figure 4-3. Editing XML with XML Spy

Many of you will use simpler editors, but the principle is the same: take a DTD, figure out what data goes in the elements and attributes as defined by that DTD, and create an XML document.

One issue that comes up often is the handling of whitespace. Will the level of indention you use change the data-bound data? What about using tabs versus spaces or single versus double quotes? These issues are important in low-level APIs like SAX because those APIs are intended to give you direct control over the data. However, in higher-level APIs like data binding, these choices become pretty inconsequential. For example, the whitespace between the root and child elements in this document fragment is completely irrelevant when using data binding:

<root>
  <child>Here is some text</child>
</root>

Because the root element has no actual textual value,[8] there is no problem with whitespace used in indenting; it’s tossed out when the data is unmarshalled.

The only issue left is that of whitespace within a textual element, like that shown here:

<root>
  <child>  Here is some text with leading and trailing spaces.    </child>
</root>

Here, you’re going into a vendor-specific paradigm. Some data binding frameworks preserve this space, resulting in the getContent( ) method on the child object returning a value like Here is some text with leading and trailing spaces. Other frameworks trim this text automatically, giving you Here is some text with leading and trailing spaces. Some frameworks give you an option to trim or not to trim this text.

If you know you don’t want leading and trailing whitespace (and you usually don’t), it’s always safe to write code like this:

// Get the object
List childElements = root.getChild(  );

// Iterate over the children
for (Iterator i = child.iterator(); i.hasNext(  ); ) {
    Child child = (Child)i.next(  );

    // Get its value, trimmed
    String childValue = child.getContent(  );
    if (childValue != null) {
        childValue = childValue.trim(  );
    } else {
        childValue = "";
    }
  
    // Do something with the value
}

Warning

Notice that this code compares the returned value from getContent( ) to null. While most data binding implementations will not return null here and instead return an empty string, it never hurts to be careful. You may save yourself a lot of frustrating debugging by using this more cautious approach.

Trimming protects you from extra whitespace despite framework variance in whitespace handling. Other than these minor issues, once an XML document (or documents) is created, you only need to validate them and then unmarshal them into Java.

Validation

I want to address the issue of data validity before getting into the semantics of converting XML to Java. Example 4-1 is a reprint of the XML document representing a movie database, which I first showed you in Chapter 3.

Example 4-1. Sample movie database

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE movies SYSTEM "movies.dtd">
  
<movies version="1.1">
  <movie>
    <title>Pitch Black</title>    
    <cast>
      <actor headliner="true">Vin Diesel</actor>
      <actor headliner="true">Radha Mitchell</actor>
      <actor>Vic Wilson</actor>
    </cast>
    <producer>Tom Engelman</producer>
  </movie>
  <movie>
    <title>Memento</title>
    <cast>
      <actor headliner="true">Guy Pearce</actor>
      <actor headliner="true">Carrie-Anne Moss</actor>
    </cast>
    <director>Christopher Nolan</director>
    <producer>Suzanne Todd</producer>
    <producer>Jennifer Todd</producer>
  </movie>
</movies>

This document uses the elements and attributes defined in the movies.dtd constraint set. Because of that, it’s a valid document. In other words, it uses only elements and attributes defined in the DTD and uses the content model specified by that DTD. It could have been created with XML Spy or by hand; in any case, it fits the constraint model defined in Chapter 3.

Just taking my word for it isn’t such a great idea; you need to be able to verify the document’s validity. Many validation frameworks allow you to validate your XML data as it is read in and unmarshalled. However, this adds processing time, which is probably not desired in your application. In many cases, you want some validation at compile time, but not at runtime.

Warning

While I’m all for making applications as fast as humanly possible, removing validation is a delicate issue. If you know that you are going to use an XML document that you have available at compile time, turning off validation makes a lot of sense.

However, data binding is often used to interpret data that is handed off to an application at runtime; for example, consider an application server that reads in deployment information for applications through data binding. In these cases, you probably want to leave validation on at runtime, despite the performance penalty. You can’t perform the compile-time validation I refer to in this section, so you need assurance that you’re getting valid data and you need to pay whatever price is necessary to get this assurance. Leave validation out, and your data binding may fail with some pretty nasty (and often cryptic!) exceptions.

Because of this, it’s helpful to have available a simple utility program that will validate a document against the DTD it specifies through the DOCTYPE declaration, as seen in Example 4-1. To help you in this endeavor, Example 4-2 shows a program that uses JAXP to validate a document.

Example 4-2. Simple validation program

package javajaxb.util;
  
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;
import java.io.Reader;
  
// JAXP classes
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
  
// SAX classes
import org.xml.sax.InputSource;
import org.xml.sax.helpers.DefaultHandler;
  
public class XMLValidator {
  
    public XMLValidator(  ) {
        // Currently, does nothing
    }
  
    public void validate(Reader reader, OutputStream errorStream) {
        PrintStream printStream = new PrintStream(errorStream);
        try {
            SAXParserFactory factory = SAXParserFactory.newInstance(  );
            factory.setValidating(true);
  
            SAXParser parser = factory.newSAXParser(  );
            parser.parse(new InputSource(reader), new DefaultHandler(  ));
  
            // If we got here, no errors occurred
            printStream.print("XML document is valid.\n");
        } catch (Exception e) {
            e.printStackTrace(printStream);
        }
    }
  
    public static void main(String[] args) {
        if (args.length != 1) {
            System.out.println("Usage: java javajaxb.util.XMLValidator " +
                "[XML filename]");
            return;
        }
  
        try {
            File xmlFile = new File(args[0]);
            FileReader reader = new FileReader(xmlFile);
  
            XMLValidator validator = new XMLValidator(  );
  
            // Validate, and write errors to system output stream
            validator.validate(reader, System.out);
        } catch (FileNotFoundException e) {
            System.out.println("Could not locate XML document '" +
                args[0] + "'");
        } catch (IOException e) {
            System.out.println("Error processing XML: " + e.getMessage(  ));
            e.printStackTrace(  );
        }
    }
}

You can compile this class and run it on a document like this:

C:\dev\javajaxb\ch04\src\xml>set CLASSPATH=c:\dev\lib\xerces.jar;
    c:\dev\javajaxb\build
  
C:\dev\javajaxb\ch04\src\xml>java javajaxb.util.XMLValidator movies.xml
XML document is valid.

On Unix, it would look like this:

bmclaugh@FRODO ~/dev/javajaxb/ch04/src/xml
$ export CLASSPATH=~/dev/lib/xerces.jar:~/dev/javajaxb/build
  
bmclaugh@FRODO ~/dev/javajaxb/ch04/src/xml
$ java javajaxb.util.XMLValidator movies.xml
XML Document is valid.

As you can see here, I’ve ensured that the movies.xml document is valid with respect to the movies database DTD (movies.dtd).

Note

A quick note on using this program: this program assumes that the DOCTYPE reference is relative to the location that the program is run within. Since in this case, the reference is simply movies.dtd, that DTD should be in the directory that the program is run within. You can use a path like DTDs/movies.dtd and put the DTD in a subdirectory called DTDs/, and it would also work.

You’ll also notice that I ensured that a parser (like Xerces) with the JAXP classes, as well as the utility program itself, is included within the classpath. If you forget this step, you’ll end up with annoying ClassNotFoundException problems.

Each of your own documents can be run through this simple program to ensure validity at compile time, rather than performing this step repeatedly at runtime. With this step out of the way, you’re now ready to convert your XML data into Java object instances.



[8] I am assuming that this document’s DTD is well written. In other words, the root element has a definition like this: <!ELEMENT root (child)+>. This definition removes the chance that PCDATA slips in and gets turned into a Java object value.

Get Java & XML Data Binding now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.