Chapter 9. Loading Fact Tables

Fact tables are the storage-hungry collections of data where the analysis details are stored. In a typical data warehouse environment, it's the fact tables that take up the most space, which is measured in gigabytes, terabytes, and sometimes even petabytes. In order to get this data from a source system into the data warehouse, you need a fast loading mechanism, consisting of several components, the basics of which are covered in this chapter. First we discuss bulk loaders, special Kettle plugins that allow you to take advantage of the bulk-loading capabilities of the various types of databases.

Note

Although the subject of this chapter is loading fact tables, the Kettle bulk loading features can be used for other bulk load tasks as well, for instance to load flat files in staging tables.

Before you can load the data into its final destination it needs to undergo other operations as well, such as looking up the correct dimension surrogate keys. We cover the techniques available in Kettle for doing this in this chapter; in Chapter 19 we'll show another technique that's purely based on using SQL.

In the final section of this chapter, we cover three different types of fact tables, as identified by Ralph Kimball. We also introduce a fourth type of fact table, the state-oriented fact table, as described by Dutch mathematician and data modeling guru Dr. Harm van der Lek. The chapter ends with a short introduction in the use of aggregate tables and how to load ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.