Chapter 8. Handling Dimension Tables

In this chapter, we take a closer look at how you can use Kettle to manage dimension tables. In particular, we'll look at those features in Kettle that are particularly useful to transform and/or generate data to fit the format of typical dimension tables, as well as the actual loading of the data into the tables. Before we discuss the details, let's consider which of the ETL subsystems discussed in Chapter 5 are involved in the management of dimension tables:

  • Change Data Capture (subsystem 2): There are a few specific issues with regard to change data capture when loading typical denormalized star schema dimension tables, so this is briefly discussed in this chapter.

  • Extraction (subsystem 3), Data Cleaning and Quality Screen Handler System (subsystem 4), and Error Event Handler (subsystem 5): These three are generic subsystems that apply to both dimension and fact tables. These subsystems were covered in Chapters 6 and 7, and won't be covered here in depth.

  • Audit Dimension Assembler (subsystem 6): Functionally, the audit dimension is a special dimension that provides data about the ETL process itself, as opposed to information that has a business context. Loading the audit dimension was covered in Chapter 7.

  • Slowly Changing Dimension Processor (subsystem 9): In many cases, data stored in dimension tables is not fixed and static: it typically changes over time, albeit typically at a much slower pace than any of the fact tables. This chapter describes ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.