Skip to Content
Python & XML
book

Python & XML

by Christopher A. Jones, Fred L. Drake
December 2001
Intermediate to advanced
380 pages
11h 54m
English
O'Reilly Media, Inc.
Content preview from Python & XML

Reading an Article

In this example, we look at how we can extract and use information from an XML document using SAX. The particular documents our script works with are simple news articles, but we’ll see how to work with elements, attributes, and textual content.

Some of the trade-offs of using SAX depend on what you’re trying to accomplish, and how the XML is structured. SAX treats XML as a continuous stream, firing events to your handler as they happen. Example 3-1 shows article.xml.

Example 3-1. article.xml

<?xml version="1.0"?>
<webArticle category="news" subcategory="technical">
    <header title="NASA Builds Warp Drive"
           length="3k"
           author="Joe Reporter"
           distribution="all"/>
    <body>Seattle, WA - Today an anonymous individual
           announced that NASA has completed building a
           Warp Drive and has parked a ship that uses
           the drive in his back yard.  This individual
           claims that although he hasn't been contacted by
           NASA concerning the parked space vessel, he assumes
           that he will be launching it later this week to
           mount an expedition to the Andromeda Galaxy.
    </body>
</webArticle>

Example 3-1 contains markup that is structured in a few different ways, and can be interesting to parse via SAX. A document such as article.xml requires that we understand how the document is structured prior to writing a handler to parse it. Therefore, the handler is tightly coupled to the document’s structure.

Writing a Simple Handler

You can write the ArticleHandler class to a new file, handlers.py; we’ll keep adding ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

XML Processing with Python

XML Processing with Python

Sean McGrath
Beginning Data Science with Python and Jupyter

Beginning Data Science with Python and Jupyter

Chris DallaVilla, Kishan Athrey

Publisher Resources

ISBN: 0596001282Errata Page