12.3. DATA EXTRACTION

As an IT professional, you must have participated in data extractions and conversions when implementing operational systems. When you went from a VSAM file-oriented order entry system to a new order processing system using relational database technology, you may have written data extraction programs to capture data from the VSAM files to get the data ready for populating the relational database.

Two major factors differentiate the data extraction for a new operational system from the data extraction for a data warehouse. First, for a data warehouse, you have to extract data from many disparate sources. Next, for a data warehouse, you have to extract data on the changes for ongoing incremental loads as well as for a one-time initial full load. For operational systems, all you need is one-time extractions and data conversions.

These two factors increase the complexity of data extraction for a data warehouse and, therefore, warrant the use of third-party data extraction tools in addition to in-house programs or scripts. Third-party tools are generally more expensive than in-house programs, but they record their own metadata. On the other hand, in-house programs increase the cost of maintenance and are hard to maintain as source systems change. If your company is in an industry where frequent changes to business conditions are the norm, then you may want to minimize the use of in-house programs. Third-party tools usually provide built-in flexibility. All you ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.