Chapter 11. ETL Development Lifecycle

In previous chapters, we introduced several parts of Kettle and showed how they fit into the 34 ETL subsystems identified by Ralph Kimball. This chapter is broader in scope in that we cover the total development lifecycle, not just individual pieces. Of course, we dive into some specific topics in detail as well.

Developing ETL solutions is an important part of building and maintaining a data warehouse and, as such, should be considered as part of a process, not an individual project. A project has one or more pre-defined goals and deliverables, and has a clearly defined start and end point. A process is an ongoing effort with periodically repeating activities to be performed. Creating ETL solutions is usually conducted as part of a project; monitoring, maintaining, and adapting solutions is part of an ongoing process. Adapting existing ETL jobs can, of course, be done in a more project-oriented setting. This chapter focuses on the initial part of the lifecycle where a new solution is being built. ETL solutions go through analysis, design, build, test, documentation, and delivery stages, just like any other piece of software. The challenge, of course, is to go through these phases as quickly as possible at the lowest possible cost and with preferably zero rework due to errors. As you will see in the subsequent sections, Pentaho's Agile BI tools are ideally suited to support this.

Solution Design

Any solution should be based on a design; jumping ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.