Skip to Content
Data Quality Fundamentals
book

Data Quality Fundamentals

by Barr Moses, Lior Gavish, Molly Vorwerck
September 2022
Beginner to intermediate
308 pages
8h 43m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Quality Fundamentals

Chapter 3. Collecting, Cleaning, Transforming, and Testing Data

Now that we have a better understanding of the various tools necessary to prioritize data reliability, let’s discuss how to ready your data for production use cases with data quality in mind.

In Chapter 2, we discussed some of the domain terminology and walked through a taxonomy of where data quality nuggets (mostly metadata) are to be found. Still, to get a thorough sense of data quality in your data pipeline, you need to look end to end, at the entire life cycle of data as it persists at your organization.

In this chapter, we’ll walk through how to manage data before and while it’s in the pipeline through four key steps that impact overall data quality: data collection, cleaning, transformation, and testing. While data collection and cleaning concern the first step of the production pipeline, transformation and testing tackle data quality while it’s midway through its journey to becoming actionable analytics.

Collecting Data

When it comes to collecting data, perhaps no aspect of the pipeline is as important as the entrypoint, the most upstream location in any data pipeline. We define an entrypoint as an initial point of contact where data from the outside world enters your pipeline. If you’re familiar with Docker containerization, you might be familiar with the ENTRYPOINT keyword. This is the initial command run whenever we start a container. Likewise, “entrypoint” in software engineering parlance ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Governance: The Definitive Guide

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9781098112035Errata PageSupplemental Content