Copying rows

At any place in a Transformation, you may decide to split the main stream into two or more streams. When you do so, you have to decide what to do with the data that leaves the last step: copy or distribute.

To copy means that the whole dataset is copied to each of the destination steps. Why would you copy the whole dataset? Mainly because you want to apply different treatments to the same set of data. For example, with our Excel file exported from the JIRA platform, we may want to generate two different outputs:

  • A detailed file with the issues
  • A spreadsheet with some statistics, as, for example, quantity of issues per severity per status

This is a typical situation where we want to copy the dataset. Copying rows is a very straightforward ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.