6.1. The Source Data for Aggregate Tables

In Chapter 5, you saw that the warehouse load is best decomposed into different load programs or processes for each warehouse table. Compartmentalizing the processing limits the impact of changes to the load process and improves manageability.

You also saw that these same guidelines extend to aggregate processing. Attempting to process aggregates in the same program that handles base tables generates complexities that damage the maintainability of the load process and complicates the process of instituting changes to the aggregate portfolio. Unless there is a requirement for real-time loads, these complications are best avoided. Instead, the base schema is processed first; aggregate tables are processed after the base schema has been loaded.

If aggregates are to be loaded through a separate process, they are best loaded from the base schema, rather than from the transaction systems. As hinted in Chapters 3 and 4, this practice greatly simplifies the load process. Changed data identification is greatly simplified, redundant processing is eliminated, and the consistency of the representation of dimensional data is guaranteed. If a real-time load is required, it may be necessary to load aggregates in the same process that loads the base schema.

6.1.1. Changed Data Identification

The efficiency of the ETL process is always a matter of concern. The load must take place within a certain window of time, and hardware resources are not unlimited. ...

Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.