Chapter 6. Tree Processing

Having done just about all we can do with streams, it’s time to move on to another style of XML processing. Instead of letting the XML fly past the program one tiny piece at a time, we will capture the whole document in memory and then start working on it. Having an in-memory representation built behind the scenes for us makes our job much easier, although it tends to require more memory and CPU cycles.

This chapter is an overview of programming with persistent XML objects, better known as tree processing. It looks at a variety of different modules and strategies for building and accessing XML trees, including the rigorous, standard Document Object Model (DOM), fast access to internal document parts with XPath, and efficient tree processing methods.

XML Trees

Every XML document can be represented as a collection of data objects linked in an acyclic structure called a tree. Each object, or node , is a small piece of the document, such as an element, a piece of text, or a processing instruction. One node, called the root, links to other nodes, and so on down to nodes that aren’t linked to anything. Graph this image out and it looks like a big, bushy tree—hence the name.

A tree structure representing a piece of XML is a handy thing to have. Since a tree is acyclic (it has no circular links), you can use simple traversal methods that won’t get stuck in infinite loops. Like a filesystem directory tree, you can represent the location of a node easily in simple ...

Get Perl and XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.