Skip to Content
Data Pipelines Pocket Reference
book

Data Pipelines Pocket Reference

by James Densmore
February 2021
Beginner to intermediate
274 pages
5h
English
O'Reilly Media, Inc.
Book available
Content preview from Data Pipelines Pocket Reference

Chapter 6. Transforming Data

In the ELT pattern defined in Chapter 3, once data has been ingested into a data lake or data warehouse (Chapter 4), the next step in a pipeline is data transformation. Data transformation can include both noncontextual manipulation of data and modeling of data with business context and logic in mind.

If the purpose of the pipeline is to produce business insight or analysis, then in addition to any noncontextual transformations, data is further transformed into data models. Recall from Chapter 2 that a data model structures and defines data in a format that is understood and optimized for data analysis. A data model is represented as one or more tables in a data warehouse.

Though data engineers at times build noncontextual transformation in a pipeline, it’s become typical for data analysts and analytics engineers to handle the vast majority of data transformations. People in these roles are more empowered than ever thanks to the emergence of the ELT pattern (they have the data they need right in the warehouse!) and supporting tools and frameworks designed with SQL as their primary language.

This chapter explores both noncontextual transformations that are common to nearly every data pipeline as well as data models that power dashboards, reports, and one-time analysis of a business problem. Because SQL is the language of the data analyst and analytics engineer, most transformation code samples are written in SQL. I include a few samples written in Python ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Mesh

Data Mesh

Zhamak Dehghani
Data Pipelines with Apache Airflow

Data Pipelines with Apache Airflow

Julian de Ruiter, Bas Harenslak

Publisher Resources

ISBN: 9781492087823Errata Page