Chapter 14. Lineage and Auditing

When you create complex data integration solutions with a lot of Kettle jobs and transformations, you may find it challenging to keep track of the results. At the same time, it is extremely important to keep an audit trail to identify and diagnose problems after they occur. It's important to know what exactly was executed, where errors occurred, and how long it takes to execute a job. In this chapter, we show you how to perform all these tasks and more on the topics of lineage, impact analysis, and auditing.

As you may recall from the ETL subsystem overview in Chapter 5, subsystem 29 covers the lineage and dependency analyzer. Lineage looks "backward" to the process and transformation steps that created the result data set you are analyzing, whereas impact analysis is executed from the start of the process. Roughly speaking, impact analysis is done from the source, and lineage analysis is done from the target.

For both impact and lineage analysis you need metadata, and this chapter begins by showing how you can use a transformation to read the Kettle metadata. This allows you to automate the extraction of lineage information so you can make it part of your nightly batches or even share this lineage information with third-party software.

Next, you'll learn more about the various kinds of lineage information. You will see where field level lineage information and database impact analysis can be obtained in Spoon, and learn how to write a transformation ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.