Copying rows

At any place in a Transformation, you may decide to split the main stream into two or more streams. When you do so, you have to decide what to do with the data that leaves the last step: copy or distribute.

To copy means that the whole dataset is copied to each of the destination steps. Why would you copy the whole dataset? Mainly because you want to apply different treatments to the same set of data. For example, with our Excel file exported from the JIRA platform, we may want to generate two different outputs:

  • A detailed file with the issues
  • A spreadsheet with some statistics, as, for example, quantity of issues per severity per status

This is a typical situation where we want to copy the dataset. Copying rows is a very straightforward ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.