Module: Versioned Backups

Credit: Mitch Chapman

Before overwriting an existing file, it is often desirable to make a backup. Example 4-1 emulates the behavior of Emacs by saving versioned backups. It’s also compatible with the marshal module, so you can use versioned output files for output in marshal format. If you find other file-writing modules that, like marshal, type-test rather than using file-like objects polymorphically, the class supplied here will stand you in good stead.

When Emacs saves a file foo.txt, it first checks to see if foo.txt already exists. If it does, the current file contents are backed up. Emacs can be configured to use versioned backup files, so, for example, foo.txt might be backed up to foo.txt.~1~. If other versioned backups of the file already exist, Emacs saves to the next available version. For example, if the largest existing version number is 19, Emacs will save the new version to foo.txt.~20~. Emacs can also prompt you to delete old versions of your files. For example, if you save a file that has six backups, Emacs can be configured to delete all but the three newest backups.

Example 4-1 emulates the versioning backup behavior of Emacs. It saves backups with version numbers (e.g., backing up foo.txt to foo.txt.~n~ when the largest existing backup number is n-1. It also lets you specify how many old versions of a file to save. A value that is less than zero means not to delete any old versions.

The marshal module lets you marshal an object to a file by way of the dump function, but dump insists that the file object you provide actually be a Python file object, rather than any arbitrary object that conforms to the file-object interface. The versioned output file shown in this recipe provides an asFile method for compatibility with marshal.dump. In many (but, alas, far from all) cases, you can use this approach to use wrapped objects when a module type-tests and thus needs the unwrapped object, solving (or at least ameliorating) the type-testing issue mentioned in Recipe 5.9. Note that Example 4-1 can be seen as one of many uses of the automatic-delegation idiom mentioned there.

The only true solution to the problem of modules using type tests rather than Python’s smooth, seamless polymorphism is to change those errant modules, but this can be hard in the case of errant modules that you did not write (particularly ones in Python’s standard library).

Example 4-1. Saving backups when writing files

""" This module provides versioned output files. When you write to such
a file, it saves a versioned backup of any existing file contents. """

import sys, os, glob, string, marshal

class VersionedOutputFile:
    """ Like a file object opened for output, but with versioned backups
    of anything it might otherwise overwrite """

    def _ _init_ _(self, pathname, numSavedVersions=3):
        """ Create a new output file. pathname is the name of the file to 
        [over]write. numSavedVersions tells how many of the most recent
        versions of pathname to save. """
        self._pathname = pathname
        self._tmpPathname = "%s.~new~" % self._pathname
        self._numSavedVersions = numSavedVersions
        self._outf = open(self._tmpPathname, "wb")

    def _ _del_ _(self):
        self.close(  )

    def close(self):
        if self._outf:
            self._outf.close(  )
            self._replaceCurrentFile(  )
            self._outf = None

    def asFile(self):
        """ Return self's shadowed file object, since marshal is
        pretty insistent on working with real file objects. """
        return self._outf

    def _ _getattr_ _(self, attr):
        """ Delegate most operations to self's open file object. """
        return getattr(self._outf, attr)

    def _replaceCurrentFile(self):
        """ Replace the current contents of self's named file. """
        self._backupCurrentFile(  )
        os.rename(self._tmpPathname, self._pathname)

    def _backupCurrentFile(self):
        """ Save a numbered backup of self's named file. """
        # If the file doesn't already exist, there's nothing to do
        if os.path.isfile(self._pathname):
            newName = self._versionedName(self._currentRevision(  ) + 1)
            os.rename(self._pathname, newName)

            # Maybe get rid of old versions
            if ((self._numSavedVersions is not None) and
                (self._numSavedVersions > 0)):
                self._deleteOldRevisions(  )

    def _versionedName(self, revision):
        """ Get self's pathname with a revision number appended. """
        return "%s.~%s~" % (self._pathname, revision)

    def _currentRevision(self):
        """ Get the revision number of self's largest existing backup. """
        revisions = [0] + self._revisions(  )
        return max(revisions)

    def _revisions(self):
        """ Get the revision numbers of all of self's backups. """
        revisions = []
        backupNames = glob.glob("%s.~[0-9]*~" % (self._pathname))
        for name in backupNames:
            try:
                revision = int(string.split(name, "~")[-2])
                revisions.append(revision)
            except ValueError:
                # Some ~[0-9]*~ extensions may not be wholly numeric
                pass
        revisions.sort(  )
        return revisions

    def _deleteOldRevisions(self):
        """ Delete old versions of self's file, so that at most
        self._numSavedVersions versions are retained. """
        revisions = self._revisions(  )
        revisionsToDelete = revisions[:-self._numSavedVersions]
        for revision in revisionsToDelete:
            pathname = self._versionedName(revision)
            if os.path.isfile(pathname):
                os.remove(pathname)

def main(  ):
    """ mainline module (for isolation testing) """
    basename = "TestFile.txt"
    if os.path.exists(basename):
        os.remove(basename)
    for i in range(10):
        outf = VersionedOutputFile(basename)
        outf.write("This is version %s.\n" % i)
        outf.close(  )

    # Now there should be just four versions of TestFile.txt:
    expectedSuffixes = ["", ".~7~", ".~8~", ".~9~"]
    expectedVersions = []
    for suffix in expectedSuffixes:
        expectedVersions.append("%s%s" % (basename, suffix))
    expectedVersions.sort(  )
    matchingFiles = glob.glob("%s*" % basename)
    matchingFiles.sort(  )
    for filename in matchingFiles:
        if filename not in expectedVersions:
            sys.stderr.write("Found unexpected file %s.\n" % filename)
        else:
            # Unit tests should clean up after themselves:
            os.remove(filename)
            expectedVersions.remove(filename)
    if expectedVersions:
        sys.stderr.write("Not found expected file")
        for ev in expectedVersions:
            sys.sdterr.write(' '+ev)
        sys.stderr.write('\n')

    # Finally, here's an example of how to use versioned
    # output files in concert with marshal:
    import marshal

    outf = VersionedOutputFile("marshal.dat")
    # Marshal out a sequence:
    marshal.dump([1, 2, 3], outf.asFile(  ))
    outf.close(  )
    os.remove("marshal.dat")

if _ _name_ _ == "_ _main_ _":
    main(  )

For a more lightweight, simpler approach to file versioning, see Recipe 4.26.

See Also

Recipe 4.26 and Recipe 5.9; documentation for the marshal module in the Library Reference.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.