Chapter 5: Data Processing and Transformations

Now that we have our initial raw dataset, we can start transforming data into the final state. When building your data pipeline, this processing and transformation process is the core of the entire pipeline and often requires separation into multiple subsets for different applications.

The core data processing is the simplest part of this process, and it is what we started looking at in Chapter 4, Sourcing the Data, where we began the process of creating the pipeline by taking the raw data, cleansing the titles and information headers, and setting the data types. This just provides us with an initial dataset to work with, and not a final dataset for use. When we look at the column headers, we see ...

Get Data Engineering with Alteryx now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.