Data Laundering with Python
We have covered a wide range of database APIs and data sources, and demonstrated that Python can connect to data from any modern database. Now we will look at some areas in which Python can do useful things with the data.
The first major area of work is what we call data laundering. This involves writing programs to acquire data from a source database, reshape it in some way, and load it into a destination database. One major difference between database development and general application development is that databases are live; you can’t just switch them off for a few months. This means that what would be a simple upgrade for a Windows application becomes a much more complex process of repeatedly migrating data and running in parallel. Here are some examples of areas where this type of work is needed:
- Database upgrades and changes
When a database is replaced, the new database structure is almost always different. The new database needs to be developed with sample data available, then tested extensively, then run in parallel with the old one while all the users and client applications are moved across. Scripts are needed to migrate the data repeatedly (usually daily) from source to destination, often performing validity checks on the way in.
- Connecting databases
Businesses often have databases whose areas of interest overlap. A fund manager might have a core system for processing deals in its funds, and a marketing database for tracking sales calls; marketing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access