12.5. DATA LOADING

It is generally agreed that transformation functions end as soon as load images are created. The next major set of functions consists of the ones that take the prepared data, apply it to the data warehouse, and store it in the database there. You create load images to correspond to the target files to be loaded in the data warehouse database.

The whole process of moving data into the data warehouse repository is referred to in several ways. You must have heard the phrases applying the data, loading the data, and refreshing the data. For the sake of clarity we will use the phrases as indicated below:

Initial Load—populating all the data warehouse tables for the very first time

Incremental Load—applying ongoing changes as necessary in a periodic manner

Full Refresh—completely erasing the contents of one or more tables and reloading with fresh data (initial load is a refresh of all the tables)

Because loading the data warehouse may take an inordinate amount of time, loads are generally cause for great concern. During the loads, the data warehouse has to be offline. You need to find a window of time when the loads may be scheduled without affecting your data warehouse users. Therefore, consider dividing up the whole load process into smaller chunks and populating a few files at a time. This will give you two benefits. You may be able to run the smaller loads in parallel. Also, you might be able to keep some parts of the data warehouse up and running while loading ...

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.