Chapter 6. Transformation: Enriching

Enriching transformation actions result in the net addition of information to your dataset. When enriching your dataset, you insert additional records or fields from other related datasets, or you use formulas to calculate new fields.

You might wonder how enriching transformations differ from structuring transformations (discussed in Chapter 5). Although both types of transformations can involve creating new fields or records, structuring transformations create new fields or records based on data already present in the dataset. Enriching transformations, in contrast, create new fields or records using new data—information that was not previously present in the dataset in any form.

There are three primary types of enriching transformations:

  • Unions

  • Joins

  • Deriving new fields

We discuss each type of enriching transformation in this chapter.

Unions

Unions append additional records to an existing dataset. In other words, when you perform a union, you are taking two related datasets and stacking them vertically to create a single dataset.

Why might you want to perform a union? Let’s imagine that you work for an organization that receives monthly orders from your clients. At the end of each quarter, you need to produce a summary analysis that records the total number of orders placed by each client over the previous three months. Because each month’s orders are contained in a separate dataset, you need to combine them into a single dataset ...

Get Principles of Data Wrangling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.