If you recall our discussion of the functions and services of the technical architecture of the data warehouse, you will see that we divided the environment into three functional areas. These areas are data acquisition, data storage, and information delivery. Data extraction, transformation, and loading encompass the areas of data acquisition and data storage. These are back-end processes that cover the extraction of data from the source systems. Next, they include all the functions and procedures for changing the source data into the exact formats and structures appropriate for storage in the data warehouse database. After the transformation of the data, these processes consist of all the functions for physically moving the data into the data warehouse repository.

Data extraction, of course, precedes all other functions. But what is the scope and extent of the data you will extract from the source systems? Do you not think that the users of your data warehouse are interested in all of the operational data for some type of query or analysis? So, why not extract all of operational data and dump it into the data warehouse? This seems to be a straightforward approach. Nevertheless, this approach is something driven by the user requirements. Your requirements definition should guide you as to what data you need to extract and from which source systems. Avoid creating a data junkhouse by dumping all the available data from the source systems and waiting to see what ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.