Searching File Information
In this section, we create a file indexing script that can generate an XML document representing your entire filesystem or a specific portion of it. Indexing files with XML is a powerful way to keep track of information, or perform bulk operations on groups of particular files on a disk. You can create an XML-generating indexing routine easily in Python. The index.py program in Example 3-4 (which shows up a little later in the chapter) starts in any directory you specify and generates an element for each file or directory that exists beneath the starting point. Once we have the index of file information, we look at how to use SAX to search the information to filter the list of files for whatever criteria interests us at the time.
Creating the Index Generator
The main part of this routine works by just checking
each file in a starting directory, and then recursing into any
directories it finds beneath the starting directory. Recursion allows
it to index an entire filesystem if you choose. On Unix, the program
performs a lot of work, as it does content checking via a popen
call to the file
command for each file. (While this
could be made more efficient by calling find
less often and requiring it to operate
on more than one file at a time, that isn’t the topic of this book.)
One of the key methods of this class is indexDirectoryFiles
:
def indexDirectoryFiles(self, dir): """Index a directory structure and creates an XML output file.""" # prepare output XML ...
Get Python & XML now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.