Parsing XML with DOM
SAX parsing
does not build any structure in memory to represent the XML document.
This makes SAX fast and highly scalable, as your application builds
exactly as little or as much in-memory structure as needed for its
specific tasks. However, for particularly complicated processing
tasks involving reasonably small XML documents, you may prefer to let
the library build in-memory structures that represent the whole XML
document, and then traverse those structures. The XML standards
describe the DOM (Document Object Model) for XML. A DOM object
represents an XML document as a tree whose root is the
document object,
while other nodes correspond to
elements, text contents, element attributes, and so
on.
The Python standard library supplies a minimal implementation of the
XML DOM standard, xml.dom.minidom
.
minidom
builds everything up in memory, with the
typical pros and cons of the DOM approach to parsing. The Python
standard library also supplies a different DOM-like approach in
module xml.dom.pulldom
. pulldom
occupies an interesting middle ground between SAX and DOM, presenting
the stream of parsing events as a Python iterator object so that you
do not code callbacks, but rather loop over the events and examine
each event to see if it’s of interest. When you do
find an event of interest to your application, you can ask
pulldom
to build the DOM subtree rooted in that
event’s node by calling method
expandNode
, and then work with that subtree as you
would in minidom ...
Get Python in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.