Chapter 5. ETL Subsystems

As surprising as it may sound, until a few years ago there was no book available that was solely dedicated to the challenges involved with ETL. Sure, ETL was covered as part of delivering a BI solution, but many people needed more in-depth guidance to help them successfully implement an ETL solution, independent of the tools used. The book The Data Warehouse ETL Toolkit by Ralph Kimball and Joe Caserta (Wiley Publishing, 2004) filled that gap. A bit later, the ideas of that book found their way into an article, "The 38 Subsystems of ETL," which added more structure to the various tasks that are part of an ETL project.


The original article can still be found online at The most recent version can be found in The Kimball Group Reader, article 11.2, "The 34 Subsystems of ETL," pp. 430–434 (Wiley 2010). The names of the subsystems in this book are taken from the latter reference since the names have been altered slightly compared to earlier publications.

In 2008, Wiley published the second edition of one of the best-selling BI books ever: The Data Warehouse Lifecycle Toolkit, also by Ralph Kimball and his colleagues in the Kimball Group. In that book, the subsystems were restructured a second time, resulting in a slightly condensed list consisting of 34 ETL subsystems. We were fortunate to get Ralph's permission to use this list as the foundation for Part II of this book, ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.