Chapter 7

Extracting Data from XML


  • How XML is usually represented in memory
  • What the DOW and the XDM are
  • What XPath is
  • How to read and write XPath expressions
  • How to learn more about the XPath language when you need it

There’s quite a lot packed into a small space here, but XPath is both important and useful. Most useful languages for querying and extracting data have fairly powerful expression languages and XPath is no exception. It’s everywhere, too: XPath “engines” are available for pretty much every programming environment, from JavaScript to Java, from PHP and Perl to Python, from C to SQL. XPath is also central to XSLT and XQuery.


XML is a text-based way to represent documents, but once an XML document has been read into memory, it’s usually represented as a tree. To make developers’ lives easier, several standard ways exist to represent and access that tree. All of these ways have differences in implementation, but once you have seen a couple, the others will generally seem very similar.

This chapter briefly introduces three of the most widely used models; you learn more about each of them later in the book. You also learn how to avoid using these data models altogether using XPath (in this chapter) and XQuery (in Chapter 9).

Meet the Models: DOM, XDM, and PSVI

The best-known data model for storing and processing XML trees is called the W3C document object model, or the DOM for short. The ...

Get Beginning XML, 5th Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.