Skip to Main Content
Python & XML
book

Python & XML

by Christopher A. Jones, Fred L. Drake
December 2001
Intermediate to advanced content levelIntermediate to advanced
380 pages
11h 54m
English
O'Reilly Media, Inc.
Content preview from Python & XML

Searching File Information

In this section, we create a file indexing script that can generate an XML document representing your entire filesystem or a specific portion of it. Indexing files with XML is a powerful way to keep track of information, or perform bulk operations on groups of particular files on a disk. You can create an XML-generating indexing routine easily in Python. The index.py program in Example 3-4 (which shows up a little later in the chapter) starts in any directory you specify and generates an element for each file or directory that exists beneath the starting point. Once we have the index of file information, we look at how to use SAX to search the information to filter the list of files for whatever criteria interests us at the time.

Creating the Index Generator

The main part of this routine works by just checking each file in a starting directory, and then recursing into any directories it finds beneath the starting directory. Recursion allows it to index an entire filesystem if you choose. On Unix, the program performs a lot of work, as it does content checking via a popen call to the file command for each file. (While this could be made more efficient by calling find less often and requiring it to operate on more than one file at a time, that isn’t the topic of this book.) One of the key methods of this class is indexDirectoryFiles:

def indexDirectoryFiles(self, dir): """Index a directory structure and creates an XML output file.""" # prepare output XML ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

XML Processing with Python

XML Processing with Python

Sean McGrath
Python One-Liners

Python One-Liners

Christian Mayer

Publisher Resources

ISBN: 0596001282Supplemental ContentErrata Page