6.4. Dropping and Rebuilding Aggregates
The aggregate load processes described in the previous sections perform incremental processing. By considering only data that is new or changed in the base schema, the volume of data to be processed is limited. An alternative approach is to drop the aggregate tables at the start of each load and rebuild them in their entirety. As discussed in Chapter 5, this approach increases the volume of data processed and does not guarantee the consistency of keys. But it also simplifies the load process, particularly for dimension tables.
6.4.1. Dropping and Rebuilding Aggregate Dimension Tables
In a drop-and-rebuild scenario, the load process for aggregate dimensions is dramatically simplified. All information being added to the dimension table is new; there will be no need to process slowly changing dimensions. Comparisons that are costly in terms of processing time are eliminated, and all aggregate dimension rows are inserts.
For an incremental load of an aggregate dimension table, much of the processing centered on identifying new and changed records. This was done through numerous comparisons of incoming data to the existing aggregate dimension. In Figure 6.3, these comparisons identified new records (step 2.1.2), type 1 changes (step 2.1.3), and type 2 changes (step 2.1.5). These comparisons are costly in terms of computing resources. Each may require a query against the base schema, or the establishment and use of a cache by the ETL tool. When ...
Get Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.