Module: jet2sql—Creating a SQL DDL from an Access Database

Credit: Matt Keranen

If you need to migrate a Jet (Microsoft Access .mdb) database to another DBMS system, or need to understand the Jet database structure in detail, you must reverse engineer from the database a standard ANSI SQL DDL description of its schema.

Example 8-1 reads the structure of a Jet database file using Microsoft’s DAO services via Python COM and creates the SQL DDL necessary to recreate the same structure (schema). Microsoft DAO has long been stable (which in the programming world is almost a synonym for dead) and will never be upgraded, but that’s not really a problem for us here, given the specific context of this recipe’s use case. Additionally, the Jet database itself is almost stable, after all. You could, of course, recode this recipe to use the more actively maintained ADO services instead of DAO (or even the ADOX extensions), but my existing DAO-based solution seems to do all I require, so I was never motivated to do so, despite the fact that ADO and DAO are really close in programming terms.

This code was originally written to aid in migrating Jet databases to larger RDBMS systems through E/R design tools when the supplied import routines of said tools missed objects such as indexes and FKs. A first experiment in Python, it became a common tool.

Note that for most uses of COM from Python, for best results, you need to ensure that Python has read and cached the type library. Otherwise, for example, constant names cannot be used, since only type libraries hold those names. You would have to use numeric literals instead, seriously hampering readability and usability (not to mention the bother of finding out which numeric literals you should use, when all available documentation is written in terms of symbolic constants).

In recent releases of win32all, the simplest way to make sure that the type library has indeed been cached is to substitute, in lieu of the statement in the recipe:

daoEngine = win32com.client.Dispatch('DAO.DBEngine.36')

the equivalent statement:

daoEngine = win32com.client.gencache.EnsureDispatch('DAO.DBEngine.36')

EnsureDispatch ensures the relevant type library is cached, instantiates the requested COM server, and returns a reference just like Dispatch would.

Alternatively, you can use makepy.py, either by hand or through the Tools menu of PythonWin (in this case, from the COM Makepy Utility, select an entry such as Microsoft DAO 3.6 Library). Yet another possibility is calling win32com.client.gencache.EnsureModule, but this is inelegant and unwieldy, because you need to find out the UUID and version numbers for the (registered) type library you want to ensure is cached. The newer EnsureDispatch is far handier, since it takes a good old ProgID string, which is easier to find out, more readable, and more compact.

Microsoft’s widespread Jet (a.k.a. Access) database engine isn’t quite SQL-compliant, but it comes close. Using this engine, all you need to migrate a database to a standard SQL relational database is a little help in reverse engineering the details of the structure, as shown in Example 8-1.

Example 8-1. Creating a SQL DDL from an Access database

# jet2sql.py - Matthew C Keranen <mck@mpinet.net> [07/12/2000]
# --------------------------------------------------------------------
# Creates ANSI SQL DDL from a MS Jet database file.  Useful to reverse
# engineer a database's design in various E/R tools.
#
# Requires DAO 3.6 library.
# --------------------------------------------------------------------
# Usage: python jet2sql.py infile.MDB outfile.SQL

import sys, string, pythoncom, win32com.client

const = win32com.client.constants
daoEngine = win32com.client.Dispatch('DAO.DBEngine.36')

quot = chr(34)
class jetReverse:
    def _ _init_ _(self, infile):
        self.jetfilename=infile
        self.dtbs = daoEngine.OpenDatabase(infile)

    def terminate(self):
        pass

    def writeTable(self, currTabl):
        self.writeLine('\ncreate table '
            + quot + currTabl.Name + quot, "", 1)
        self.writeLine('(', "", 1)

        # Write columns
        cn=0
        for col in currTabl.Fields:
            cn = cn + 1
            self.writeColumn(col.Name, col.Type, col.Size,
                col.Required, col.Attributes, col.DefaultValue,
                col.ValidationRule, currTabl.Fields.Count-cn)

        # Validation rule
        tablRule = currTabl.ValidationRule
        if tablRule != "":
            tablRule = "    check(" + tablRule + ") "
            self.writeLine("", ",", 1) # add a comma and CR
            self.writeLine(tablRule, "", 0)

        # Primary key
        pk=self.getPrimaryKey(currTabl)
        if pk <> "":
            self.writeLine("", ",", 1) # add a comma and CR
            self.writeLine(pk, "", 0)

        # End of table
        self.writeLine("", "", 1) # terminate previous line
        self.writeLine(');', "", 1)

        # Write table comment
        try: sql = currTabl.Properties("Description").Value
        except pythoncom.com_error: sql = ""
        if sql != "":
            sql = ("comment on table " + quot + currTabl.Name + quot +
                " is " + quot + sql + quot + ";")
            self.writeLine(sql, "", 1)

        # Write column comments
        for col in currTabl.Fields:
            try: sql = col.Properties("Description").Value
            except pythoncom.com_error: sql = ""
            if sql != "":
                sql = ("comment on column " + quot + currTabl.Name
                    + quot + "." + quot + col.Name + quot +
                    " is " + quot + sql + quot + ";")
                self.writeLine(sql,"",1)

        # Write indexes
        self.writeIndexes(currTabl)

    def writeColumn(self, colName, colType, length, requird,
        attributes, default, check, colRix):
        # colRix: 0-based index of column from right side
        # 0 indicates rightmost column
        if colType == const.dbByte: dataType = "Byte"
        elif colType == const.dbInteger: dataType = "Integer"
        elif colType == const.dbSingle: dataType = "Single"
        elif colType == const.dbDouble: dataType = "Double"
        elif colType == const.dbDate: dataType = "DateTime"
        elif colType == const.dbLongBinary: dataType = "OLE"
        elif colType == const.dbMemo: dataType = "Memo"
        elif colType == const.dbCurrency: dataType = "Currency"
        elif colType == const.dbLong:
            if attributes & const.dbAutoIncrField:
                dataType = "Counter"
            else:
                dataType = "LongInteger"
        elif colType == const.dbText:
            if length == 0: dataType = "Text"
            else: dataType = "char("+str(length)+")"
        elif colType == const.dbBoolean:
            dataType = "Bit"
            if default == "Yes": default = "1"
            else: default = "0"
        else:
            if length == 0: dataType = "Text"
            else: dataType = "Text("+str(length)+")"

        if default != "":
            defaultStr = "default " + default + " "
        else: defaultStr = ""

        if check != "":
            checkStr = "check(" + check + ") "
        else:
            checkStr = ""

        if requird or (attributes & const.dbAutoIncrField):
            mandatory = "not null "
        else:
            mandatory = ""

        sql = ("    " + quot + colName + quot + " " + dataType
            + " " + defaultStr + checkStr + mandatory)
        if colRix > 0:
            self.writeLine(sql, ",", 1)
        else:
            self.writeLine(sql, "", 0)

    def getPrimaryKey(self, currTabl):
        # Get primary key fields
        sql = ""
        for idx in currTabl.Indexes:
           if idx.Primary:
              idxName = idx.Name
              sql = "    primary key "
              cn = 0
              for col in idx.Fields:
                  cn = cn+1
                  sql = sql + quot + col.Name + quot
                  if idx.Fields.Count > cn: sql = sql + ","
        return sql

    def writeIndexes(self, currTabl):
        # Write index definition
        nIdx = -1
        for idx in currTabl.Indexes:
            nIdx = nIdx + 1
            idxName = idx.Name
            tablName = currTabl.Name
            if idx.Primary:
                idxName = tablName + "_PK"
            elif idxName[:9] == "REFERENCE":
               idxName = tablName + "_FK" + idxName[10:]
            else:
                idxName = tablName + "_IX" + str(nIdx)

            sql = "create "
            if idx.Unique: sql = sql + "unique "
            if idx.Clustered: sql = sql + "clustered "
            sql = sql + "index " + quot + idxName + quot
            sql = sql + " on " + quot + tablName + quot + " ("

            # Write index columns
            cn = 0
            for col in idx.Fields:
                cn = cn + 1
                sql = sql + quot + col.Name + quot
                if col.Attributes & const.dbDescending:
                    sql = sql + " desc"
                else:
                    sql = sql + " asc"
                if idx.Fields.Count > cn: sql = sql + ","

            sql = sql + " );"

            self.writeLine(sql,"",1)

    def writeForeignKey(self, currRefr):
        # Export foreign key
        sql = "\nalter table " + quot + currRefr.ForeignTable + quot
        self.writeLine(sql, "", 1)

        sql = "    add foreign key ("
        cn = 0
        for col in currRefr.Fields:
            cn = cn + 1
            sql = sql + quot + col.ForeignName + quot
            if currRefr.Fields.Count > cn: sql = sql + ","

        sql = sql + ")"
        self.writeLine(sql, "", 1)

        sql = "    references " + quot + currRefr.Table + quot + " ("
        cn = 0
        for col in currRefr.Fields:
            cn = cn + 1
            sql = sql + quot + col.Name + quot
            if currRefr.Fields.Count > cn: sql = sql + ","

        sql = sql + ")"
        if currRefr.Attributes & const.dbRelationUpdateCascade:
           sql = sql + " on update cascade"
        if currRefr.Attributes & const.dbRelationDeleteCascade:
           sql = sql + " on delete cascade"
        sql = sql + ";"
        self.writeLine(sql, "", 1)

    def writeQuery(self, currQry):
        sql = "\ncreate view " + quot + currQry.Name + quot + " as"
        self.writeLine(sql, "", 1)

        # Write query text
        sql = string.replace(currQry.SQL, chr(13), "") # Remove extra \ns
        self.writeLine(sql, "", 1)

        # Write query comment
        try: sql = currQry.Properties("Description").Value
        except pythoncom.com_error: sql = ""
        if sql <> "":
            sql = ("comment on table " + quot + currQry.Name +
                quot + " is " + quot + sql + quot)
            self.writeLine(sql,"",1)

    def writeLine(self, strLine, delimit, newline):
        # Used for controlling where lines terminate with a comma
        # or other continuation mark
        sqlfile.write(strLine)
        if delimit: sqlfile.write(delimit)
        if newline: sqlfile.write('\n')

if _ _name_ _ == '_ _main_ _':
    if len(sys.argv) != 3:
        print "Usage: jet2sql.py infile.mdb outfile.sql"
    else:
        jetEng = jetReverse(sys.argv[1])
        outfile = sys.argv[2]
        sqlfile = open(outfile,'w')
        print "\nReverse engineering %s to %s" % (
            jetEng.jetfilename, outfile)

        # Tables
        sys.stdout.write("\n   Tables")
        for tabl in jetEng.dtbs.TableDefs:
            if tabl.Name[:4] != "MSys" and tabl.Name[:4] != "~TMP":
                sys.stdout.write(".")
                jetEng.writeTable(tabl)
            else:
                sys.stdout.write(",")

        # Relations/FKs
        sys.stdout.write("\n   Relations")
        for fk in jetEng.dtbs.Relations:
            sys.stdout.write(".")
            jetEng.writeForeignKey(fk)

        # Queries
        sys.stdout.write("\n   Queries")
        for qry in jetEng.dtbs.QueryDefs:
            sys.stdout.write(".")
            jetEng.writeQuery(qry)

        print "\n   Done\n"

        # Done
        sqlfile.close(  )
        jetEng.terminate(  )

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.