The xmllib Module
The xmlib module provides a simple XML parser, using regular expressions to
pull the XML data apart, as shown in Example 5-1. The parser does basic checks on the
document, such as a check to see that there is only one top-level element
and a check to see that all tags are balanced.
You feed XML data to this parser piece by piece (as data arrives over a network, for example). The parser calls methods in itself for start tags, data sections, end tags, and entities, among other things.
If you’re only interested in a few tags, you can define special
start_tag and end_tag
methods, where tag is the tag name. The
start functions are called with the attributes
given as a dictionary.
Example 5-1. Using the xmllib Module to Extract Information from an Element
File: xmllib-example-1.py
import xmllib
class Parser(xmllib.XMLParser):
# get quotation number
def _ _init_ _(self, file=None):
xmllib.XMLParser._ _init_ _(self)
if file:
self.load(file)
def load(self, file):
while 1:
s = file.read(512)
if not s:
break
self.feed(s)
self.close()
def start_quotation(self, attrs):
print "id =>", attrs.get("id")
raise EOFError
try:
c = Parser()
c.load(open("samples/sample.xml"))
except EOFError:
pass
id => 031Example 5-2 contains a simple (and incomplete) rendering
engine. The parser maintains an element stack
(_ _tags), which it passes to the renderer, together with text fragments. The renderer looks up the current tag hierarchy in a style dictionary, and if it isn’t already there, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access