Chapter 2. Kettle Concepts

In this chapter we cover the various concepts behind Kettle. We take a look at the general design principles and describe the various data integration building blocks. First we show you how row level data integration is performed using transformations. Then we explain how you can handle basic workflow using jobs.

You will learn about the following Kettle concepts:

  • Database connections

  • Tools and utilities

  • Repositories

  • Virtual File Systems

  • Parameters and variables

  • Visual programming

Design Principles

Let's start with a look at some of the core design principles that have been put in place since the very beginning of Kettle's development. Previous negative experiences with other tools and frameworks obviously colored the decisions that have been taken. However, it's especially interesting to look at the positive things that were retained from these experiences.

  • Ease of development: It's clear that as a data warehouse and ETL developer, you want to spend time on the creation of a business intelligence solution. Every hour you spend on the installation of software is wasted. The same principle applies to the configuration as well. For example, when Kettle came on the market, pretty much every Java-based tool that existed forced the user to explicitly specify the Java driver class name and JDBC URL just to create a database connection. This is not the sort of problem that can't be overcome with a few Internet searches but it is something that draws attention away from ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.