Native Parser Interfaces
Now that we’ve looked at how SAX can be used and have seen
just how regular the code is to set up the parser and the ContentHandler, you may be wondering how much
of that ease comes from using SAX and how much is a matter of
convenience functions in the Python libraries. While we won’t delve
deeply into the native interfaces of the individual parsers, this is a
good question, and can lead to some interesting observations.
The key advantage to using SAX is that the callback methods have the same names and significance regardless of the actual parser you use. There are at least two nice results of this: changing parsers does not affect your application, and your code is more maintainable because someone new to the code is more likely to know the SAX interface than any particular parser-specific interface.
So just how do the native interfaces to the individual parsers differ from SAX, and why would we choose to use them instead? Let’s take a quick look at the PyExpat parser to get a taste of the differences.
Using PyExpat Directly
Of course, to use PyExpat, you need to have it installed. It is included as part of the Python installer for Windows, and is built automatically on Unix if you have the Expat library installed. If you did not install PyExpat as part of Python, it is installed as part of the PyXML package.
PyExpat resides in the xml.parsers.expat module. If we want to modify our last example to use PyExpat directly, we don’t have a lot of work to do, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access