O'Reilly logo

Java Cookbook by Ian F. Darwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Parsing XML with SAX

Problem

You want to make one quick pass over an XML file, extracting certain tags or other information as you go.

Solution

Simply use SAX to create a document handler and pass it to the SAX parser.

Discussion

The XML DocumentHandler interface specifies a number of “callbacks” that your code must provide. In one sense this is similar to the Listener interfaces in AWT and Swing, as covered briefly in Section 13.5. The most commonly used methods are startElement() , endElement(), and text( ). The first two, obviously, are called at the start and end of an element, and text( ) is called when there is character data. The characters are stored in a large array, and you are passed the base of the array and the offset and length of the characters that make up your text. Conveniently, there is a string constructor that takes exactly these arguments. Hmmm, I wonder if they thought of that . . .

To demonstrate this, I wrote a simple program using SAX to extract names and email addresses from an XML file. The program itself is reasonably simple, and is shown in Example 21-4.

Example 21-4. SaxLister.java

import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; /** Simple lister - extract name and email tags from a user file. * Updated for SAX 2.0 */ public class SaxLister { class PeopleHandler extends DefaultHandler { boolean name = false; boolean mail = false; public void startElement(String nsURI, String strippedName, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required