Skip to Content
Python Standard Library
book

Python Standard Library

by Fredrik Lundh
May 2001
Intermediate to advanced
304 pages
6h 12m
English
O'Reilly Media, Inc.
Content preview from Python Standard Library

Chapter 5. File Formats

Overview

This chapter describes a number of modules that are used to parse different file formats.

Markup Languages

Python comes with extensive support for the Extensible Markup Language (XML) and Hypertext Markup Language (HTML) file formats. Python also provides basic support for Standard Generalized Markup Language (SGML).

All these formats share the same basic structure because both HTML and XML are derived from SGML. Each document contains a mix of start tags, end tags, plain text (also called character data), and entity references, as shown in the following:

<document name="sample.xml">
    <header>This is a header</header>
    <body>This is the body text.  The text can contain
    plain text (&quot;character data&quot;), tags, and
    entities.
    </body>
</document>

In the previous example, <document>, <header>, and <body> are start tags. For each start tag, there’s a corresponding end tag that looks similar, but has a slash before the tag name. The start tag can also contain one or more attributes, like the name attribute in this example.

Everything between a start tag and its matching end tag is called an element. In the previous example, the document element contains two other elements: header and body.

Finally, &quot; is a character entity. It is used to represent reserved characters in the text sections. In this case, it’s an ampersand (&), which is used to start the entity itself. Other common entities include &lt; for less than (<), and

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

The Python 3 Standard Library by Example, Second Edition

The Python 3 Standard Library by Example, Second Edition

Doug Hellmann
Dive Into Python 3

Dive Into Python 3

Mark Pilgrim
Python One-Liners

Python One-Liners

Christian Mayer

Publisher Resources

ISBN: 0596000960Catalog PageErrata