All of the great ideas XML has brought to us are not much use without some tools to use these ideas within our familiar programming environments. Luckily, XML has been paired with Java since its inception, and Java boasts the most complete set of APIs available to allow use of XML directly within Java code. While C, C++, and Perl are quickly catching up, Java continues to set the standard on how to use XML from applications. There are two basic stages that occur in an XML document’s lifecycle from an application point of view, as shown in Figure 1.1. First, the document is parsed, and then the data within it is manipulated.
As Java developers, we are fortunate to have simple ways to handle these tasks and more.
SAX is the Simple
API for XML
. It provides an event-based framework for
parsing XML data, which is the process of reading through the
document and breaking down the data into usable parts; at each step
of the way, SAX defines events that can occur. For example, SAX
defines an org.xml.sax.ContentHandler
interface
that defines methods such as startDocument( )
and
endElement( )
. Implementing this interface allows
complete control over these portions of the XML parsing process.
There is a similar interface for handling errors and lexical
constructs. A set of errors and warnings is defined, allowing
handling of the various situations that can occur in XML parsing,
such as an invalid document, or one that is not well-formed. Behavior
can be added to customize the parsing process, resulting in very
application-specific tasks being available for definition, all with a
standard interface into XML documents. For the SAX API documentation
and other information on SAX, visit
http://www.megginson.com/SAX.
Before continuing, it is important to clear up a common misconception about SAX. SAX is often mistaken for an XML parser. We even discuss SAX here as providing a means to parse XML data. However, SAX provides a framework for parsers to use, and defines events within the parsing process to monitor. A parser must be supplied to SAX to perform any XML parsing. This has resulted in many excellent parsers being made available in Java, such as Sun’s Project X, the Apache Software Foundation’s Xerces, Oracle’s XML Parser, and IBM’s XML4J. These can all be plugged into the SAX APIs and result in parsed XML data. SAX APIs provide the means to parse a document, not the XML parser itself.
DOM is an API for the
Document Object Model
. While SAX only provides
access to the data within an XML document, DOM
is designed to provide a means of manipulating that data. DOM
provides a representation of an XML document as a tree. Because a
tree is an age-old data representation, traversal and manipulation of
tree structures are easy to accomplish in programming languages, Java
being no exception. DOM also reads an entire XML document into
memory, storing all the data in nodes
, so the
entire document is very fast to access; it is all in memory for the
length of its existence in the DOM tree. Each node represents a piece
of the data pulled from the original document.
There is a significant drawback to DOM, however. Because DOM reads an entire document into memory, resources can become very heavily taxed, often slowing down or even crippling an application. The larger and more complex the document, the more pronounced this performance degradation becomes. Keep in mind that while DOM is a good, prevalent means of manipulating XML data, it is not the only means of accomplishing this task. We will spend time using DOM, and we will also write code that manipulates data straight from SAX. Your application requirements will most likely define which solution is correct for your specific development project. To read the DOM recommendations at W3C, go to http://www.w3.org/DOM in your web browser.
JAXP is Sun’s Java API
for XML Parsing
. A relatively new addition to the XML
developer’s arsenal, it attempts to provide cohesiveness to the
SAX and DOM APIs. While it does not compete with or replace either of
these APIs, it does add some convenience methods to try to make the
XML APIs easier to use for Java developers. It conforms to the SAX
and DOM specifications, as well as adhering to the namespace
Recommendation we discussed earlier. JAXP does not redefine SAX or
DOM behavior, but ensures that all XML-conformant parsers can be
accessed within Java applications through a standard pluggability
layer.
It is expected that JAXP will continue to evolve as both SAX and DOM go through revision. It is also assumed that JAXP will eventually be part of other Sun specifications, as both the Tomcat servlet engine and the EJB 1.1 specification require XML-formatted configuration and deployment files. Although the J2EE™ 1.3 and J2SE™ 1.4 specifications do not mention JAXP explicitly, they are expected to have integrated JAXP support as well. For the complete JAXP specification, go to http://java.sun.com/xml .
These three APIs make up the Java developers toolkit for handling XML. While this is not a formal designation, these three APIs do provide us the mechanism to get XML data and manipulate it, all within normal Java code. These APIs will be our workhorses throughout the book, and we will learn to use every aspect of the classes that each provides.
Get Java and XML now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.