Credit: Matt Keranen
If you need to migrate a
Jet (Microsoft Access
.mdb
) database to
another DBMS system, or need to understand the Jet database structure
in detail, you must reverse engineer from the database a standard
ANSI SQL DDL description of its schema.
Example 8-1 reads the structure of a Jet database file using Microsoft’s DAO services via Python COM and creates the SQL DDL necessary to recreate the same structure (schema). Microsoft DAO has long been stable (which in the programming world is almost a synonym for dead) and will never be upgraded, but that’s not really a problem for us here, given the specific context of this recipe’s use case. Additionally, the Jet database itself is almost stable, after all. You could, of course, recode this recipe to use the more actively maintained ADO services instead of DAO (or even the ADOX extensions), but my existing DAO-based solution seems to do all I require, so I was never motivated to do so, despite the fact that ADO and DAO are really close in programming terms.
This code was originally written to aid in migrating Jet databases to larger RDBMS systems through E/R design tools when the supplied import routines of said tools missed objects such as indexes and FKs. A first experiment in Python, it became a common tool.
Note that for most uses of COM from Python, for best results, you need to ensure that Python has read and cached the type library. Otherwise, for example, constant names cannot be used, since only type libraries hold those names. You would have to use numeric literals instead, seriously hampering readability and usability (not to mention the bother of finding out which numeric literals you should use, when all available documentation is written in terms of symbolic constants).
In recent releases of win32all
, the simplest way
to make sure that the type library has indeed been cached is to
substitute, in lieu of the statement in the recipe:
daoEngine = win32com.client.Dispatch('DAO.DBEngine.36')
the equivalent statement:
daoEngine = win32com.client.gencache.EnsureDispatch('DAO.DBEngine.36')
EnsureDispatch
ensures the relevant type library is cached, instantiates the
requested COM server, and returns a reference just like
Dispatch
would.
Alternatively, you can use
makepy.py
,
either by hand or through the Tools menu of PythonWin (in this case,
from the COM Makepy Utility, select an entry such as Microsoft DAO
3.6 Library). Yet another possibility is calling
win32com.client.gencache.EnsureModule
,
but this is inelegant and unwieldy, because you need to find out the
UUID and version numbers for the (registered) type library you want
to ensure is cached. The newer EnsureDispatch
is
far handier, since it takes a good old ProgID
string, which is easier to find out, more readable, and more compact.
Microsoft’s widespread Jet (a.k.a. Access) database engine isn’t quite SQL-compliant, but it comes close. Using this engine, all you need to migrate a database to a standard SQL relational database is a little help in reverse engineering the details of the structure, as shown in Example 8-1.
Example 8-1. Creating a SQL DDL from an Access database
# jet2sql.py - Matthew C Keranen <mck@mpinet.net> [07/12/2000] # -------------------------------------------------------------------- # Creates ANSI SQL DDL from a MS Jet database file. Useful to reverse # engineer a database's design in various E/R tools. # # Requires DAO 3.6 library. # -------------------------------------------------------------------- # Usage: python jet2sql.py infile.MDB outfile.SQL import sys, string, pythoncom, win32com.client const = win32com.client.constants daoEngine = win32com.client.Dispatch('DAO.DBEngine.36') quot = chr(34) class jetReverse: def _ _init_ _(self, infile): self.jetfilename=infile self.dtbs = daoEngine.OpenDatabase(infile) def terminate(self): pass def writeTable(self, currTabl): self.writeLine('\ncreate table ' + quot + currTabl.Name + quot, "", 1) self.writeLine('(', "", 1) # Write columns cn=0 for col in currTabl.Fields: cn = cn + 1 self.writeColumn(col.Name, col.Type, col.Size, col.Required, col.Attributes, col.DefaultValue, col.ValidationRule, currTabl.Fields.Count-cn) # Validation rule tablRule = currTabl.ValidationRule if tablRule != "": tablRule = " check(" + tablRule + ") " self.writeLine("", ",", 1) # add a comma and CR self.writeLine(tablRule, "", 0) # Primary key pk=self.getPrimaryKey(currTabl) if pk <> "": self.writeLine("", ",", 1) # add a comma and CR self.writeLine(pk, "", 0) # End of table self.writeLine("", "", 1) # terminate previous line self.writeLine(');', "", 1) # Write table comment try: sql = currTabl.Properties("Description").Value except pythoncom.com_error: sql = "" if sql != "": sql = ("comment on table " + quot + currTabl.Name + quot + " is " + quot + sql + quot + ";") self.writeLine(sql, "", 1) # Write column comments for col in currTabl.Fields: try: sql = col.Properties("Description").Value except pythoncom.com_error: sql = "" if sql != "": sql = ("comment on column " + quot + currTabl.Name + quot + "." + quot + col.Name + quot + " is " + quot + sql + quot + ";") self.writeLine(sql,"",1) # Write indexes self.writeIndexes(currTabl) def writeColumn(self, colName, colType, length, requird, attributes, default, check, colRix): # colRix: 0-based index of column from right side # 0 indicates rightmost column if colType == const.dbByte: dataType = "Byte" elif colType == const.dbInteger: dataType = "Integer" elif colType == const.dbSingle: dataType = "Single" elif colType == const.dbDouble: dataType = "Double" elif colType == const.dbDate: dataType = "DateTime" elif colType == const.dbLongBinary: dataType = "OLE" elif colType == const.dbMemo: dataType = "Memo" elif colType == const.dbCurrency: dataType = "Currency" elif colType == const.dbLong: if attributes & const.dbAutoIncrField: dataType = "Counter" else: dataType = "LongInteger" elif colType == const.dbText: if length == 0: dataType = "Text" else: dataType = "char("+str(length)+")" elif colType == const.dbBoolean: dataType = "Bit" if default == "Yes": default = "1" else: default = "0" else: if length == 0: dataType = "Text" else: dataType = "Text("+str(length)+")" if default != "": defaultStr = "default " + default + " " else: defaultStr = "" if check != "": checkStr = "check(" + check + ") " else: checkStr = "" if requird or (attributes & const.dbAutoIncrField): mandatory = "not null " else: mandatory = "" sql = (" " + quot + colName + quot + " " + dataType + " " + defaultStr + checkStr + mandatory) if colRix > 0: self.writeLine(sql, ",", 1) else: self.writeLine(sql, "", 0) def getPrimaryKey(self, currTabl): # Get primary key fields sql = "" for idx in currTabl.Indexes: if idx.Primary: idxName = idx.Name sql = " primary key " cn = 0 for col in idx.Fields: cn = cn+1 sql = sql + quot + col.Name + quot if idx.Fields.Count > cn: sql = sql + "," return sql def writeIndexes(self, currTabl): # Write index definition nIdx = -1 for idx in currTabl.Indexes: nIdx = nIdx + 1 idxName = idx.Name tablName = currTabl.Name if idx.Primary: idxName = tablName + "_PK" elif idxName[:9] == "REFERENCE": idxName = tablName + "_FK" + idxName[10:] else: idxName = tablName + "_IX" + str(nIdx) sql = "create " if idx.Unique: sql = sql + "unique " if idx.Clustered: sql = sql + "clustered " sql = sql + "index " + quot + idxName + quot sql = sql + " on " + quot + tablName + quot + " (" # Write index columns cn = 0 for col in idx.Fields: cn = cn + 1 sql = sql + quot + col.Name + quot if col.Attributes & const.dbDescending: sql = sql + " desc" else: sql = sql + " asc" if idx.Fields.Count > cn: sql = sql + "," sql = sql + " );" self.writeLine(sql,"",1) def writeForeignKey(self, currRefr): # Export foreign key sql = "\nalter table " + quot + currRefr.ForeignTable + quot self.writeLine(sql, "", 1) sql = " add foreign key (" cn = 0 for col in currRefr.Fields: cn = cn + 1 sql = sql + quot + col.ForeignName + quot if currRefr.Fields.Count > cn: sql = sql + "," sql = sql + ")" self.writeLine(sql, "", 1) sql = " references " + quot + currRefr.Table + quot + " (" cn = 0 for col in currRefr.Fields: cn = cn + 1 sql = sql + quot + col.Name + quot if currRefr.Fields.Count > cn: sql = sql + "," sql = sql + ")" if currRefr.Attributes & const.dbRelationUpdateCascade: sql = sql + " on update cascade" if currRefr.Attributes & const.dbRelationDeleteCascade: sql = sql + " on delete cascade" sql = sql + ";" self.writeLine(sql, "", 1) def writeQuery(self, currQry): sql = "\ncreate view " + quot + currQry.Name + quot + " as" self.writeLine(sql, "", 1) # Write query text sql = string.replace(currQry.SQL, chr(13), "") # Remove extra \ns self.writeLine(sql, "", 1) # Write query comment try: sql = currQry.Properties("Description").Value except pythoncom.com_error: sql = "" if sql <> "": sql = ("comment on table " + quot + currQry.Name + quot + " is " + quot + sql + quot) self.writeLine(sql,"",1) def writeLine(self, strLine, delimit, newline): # Used for controlling where lines terminate with a comma # or other continuation mark sqlfile.write(strLine) if delimit: sqlfile.write(delimit) if newline: sqlfile.write('\n') if _ _name_ _ == '_ _main_ _': if len(sys.argv) != 3: print "Usage: jet2sql.py infile.mdb outfile.sql" else: jetEng = jetReverse(sys.argv[1]) outfile = sys.argv[2] sqlfile = open(outfile,'w') print "\nReverse engineering %s to %s" % ( jetEng.jetfilename, outfile) # Tables sys.stdout.write("\n Tables") for tabl in jetEng.dtbs.TableDefs: if tabl.Name[:4] != "MSys" and tabl.Name[:4] != "~TMP": sys.stdout.write(".") jetEng.writeTable(tabl) else: sys.stdout.write(",") # Relations/FKs sys.stdout.write("\n Relations") for fk in jetEng.dtbs.Relations: sys.stdout.write(".") jetEng.writeForeignKey(fk) # Queries sys.stdout.write("\n Queries") for qry in jetEng.dtbs.QueryDefs: sys.stdout.write(".") jetEng.writeQuery(qry) print "\n Done\n" # Done sqlfile.close( ) jetEng.terminate( )
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.