Chapter 12. Processing XML

Introduction

Credit: Paul Prescod, co-author of XML Handbook (Prentice-Hall)

XML has become a central technology for all kinds of information exchange. Today, most new file formats that are invented are based on XML. Most new protocols are based upon XML. It simply isn’t possible to work with the emerging Internet infrastructure without supporting XML. Luckily, Python has had XML support since many versions ago, and Python’s support for XML has kept growing and maturing year after year.

Python and XML are perfect complements. XML is an open standards way of exchanging information. Python is an open source language that processes the information. Python excels at text processing and at handling complicated data structures. XML is text based and is, above all, a way of exchanging complicated data structures.

That said, working with XML is not so seamless that it requires no effort. There is always somewhat of a mismatch between the needs of a particular programming language and a language-independent information representation. So there is often a requirement to write code that reads (i.e., deserializes or parses) and writes (i.e., serializes) XML.

Parsing XML can be done with code written purely in Python, or with a module that is a C/Python mix. Python comes with the fast Expat parser written in C. Many XML applications use the Expat parser, and one of these recipes accesses Expat directly to build its own concept of an ideal in-memory Python representation ...

Get Python Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.