Chapter 5. Transformation: Structuring

Overview of Structuring

You might remember our discussion of structure as a metadata element from Chapter 2. Structuring as a transformation action involves changing your dataset’s structure or granularity. In other words, structuring consists of any actions that change the form or schema of your data.

At a high level, there are two sets of structuring actions that you might need to apply to your datasets. The first group of structuring transformations involves manipulating individual records and fields. We call this intrarecord structuring. Intrarecord structuring transformations roughly fall into three buckets:

  • Reordering record fields (moving columns)

  • Creating new record fields through extracting values

  • Combining multiple record fields into a single record field

The second group of structuring transformations involves operating on multiple records and fields at once. We call this interrecord structuring. These types of transformations fit roughly into two types:

  • Filtering datasets by removing sets of records

  • Shifting the granularity of the dataset and the fields associated with records through aggregations and pivots

We will discuss each set of structuring transformations in turn so you can understand when you might want to apply these operations to your datasets.

Intrarecord Structuring: Extracting Values

Extraction involves creating a new record field from an existing record field. Frequently, this involves identifying ...

Get Principles of Data Wrangling now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.