In this section, we create a file indexing script that can generate an XML document representing your entire filesystem or a specific portion of it. Indexing files with XML is a powerful way to keep track of information, or perform bulk operations on groups of particular files on a disk. You can create an XML-generating indexing routine easily in Python. The index.py program in Example 3-4 (which shows up a little later in the chapter) starts in any directory you specify and generates an element for each file or directory that exists beneath the starting point. Once we have the index of file information, we look at how to use SAX to search the information to filter the list of files for whatever criteria interests us at the time.
The main part of this routine works by just checking
each file in a starting directory, and then recursing into any
directories it finds beneath the starting directory. Recursion allows
it to index an entire filesystem if you choose. On Unix, the program
performs a lot of work, as it does content checking via a
popen call to the
file command for each file. (While this
could be made more efficient by calling
find less often and requiring it to operate
on more than one file at a time, that isn’t the topic of this book.)
One of the key methods of this class is
def indexDirectoryFiles(self, dir): """Index a directory structure and creates an XML output file.""" # prepare output XML ...