Chapter 19. Data Vault Management

This chapter was written by Kasper de Graaf of DIKW-Academy, a well-known expert in the field of Data Vault modeling.

Data warehousing started somewhere in the 1990s when Bill Inmon and Ralph Kimball started publishing their data warehousing ideas. Both approaches can be used to create an environment that supports analysis and reporting. However, Inmon and Kimball have some differences of opinion, sometimes referred to as "The Big Debate."

Before we dive into the differences, let us start with the similarities. Inmon and Kimball do not disagree about the usage of data marts. A data mart is a database that is aimed at end user usage. It is usually modeled using star schemas (Chapter 4 contains an example of a star schema—The Rental Star Schema) and optimized for analysis and reporting.

The biggest difference between the two architectures is about the need for an enterprise data warehouse (EDW). Inmon says you need one, Kimball says you don't. An EDW is basically a large database that contains integrated, historical data from several other databases. The EDW is not used for querying by end users. It is used solely for complete, transparent, and auditable storage of all data that is considered relevant for reporting. In the vision of Bill Inmon, an EDW sits between the source databases and the data marts and thus acts as the single source for the data marts.

Note

For a more elaborate explanation about data warehousing please see Chapter 6 of the book

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.