SAX Readers

Without spending any further time on the preliminaries, let’s begin to code. Our first program will be able to take an XML file as a command-line parameter, and parse that file. We will build document callbacks into the parsing process so that we can display events in the parsing process as they occur, which will give us a better idea of what exactly is going on “under the hood.”

The first thing we need to do is get an instance of a class that conforms to the SAX org.xml.sax.XMLReader interface. This interface defines parsing behavior and allows us to set features and properties, which we will look at in Chapter 5. For those of you familiar with SAX 1.0, this interface replaces the org.xml.sax.Parser interface.

Instantiating a Reader

SAX provides an interface that all SAX-compliant XML parsers should implement. This allows SAX to know exactly what methods are available for callback and use within an application. For example, the Xerces main SAX parser class, org.apache.xerces.parsers.SAXParser, implements the org.xml.sax.XMLReader interface. If you have access to the source of your parser, you should see the same interface implemented in your parser’s main SAX parser class. Each XML parser must have one class (sometimes more!) that implements this interface, and that is the class we need to instantiate to allow us to parse XML:

XMLReader parser = 
  new SAXParser(  );

// Do something with the parser
parser.parse(uri);

For those of you new to SAX entirely, it may be a bit confusing ...

Get Java and XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.