Skip to Content
Automating Data Quality Monitoring
book

Automating Data Quality Monitoring

by Jeremy Stanley, Paige Schwartz
January 2024
Intermediate to advanced
220 pages
6h 3m
English
O'Reilly Media, Inc.
Content preview from Automating Data Quality Monitoring

Chapter 4. Automating Data Quality Monitoring with Machine Learning

Machine learning is a statistical approach that, compared to rule-based testing and metrics monitoring, has many advantages: it’s scalable, can detect unknown-unknown changes, and, at the risk of anthropomorphizing, it’s smart. It can learn from prior inputs, use contextual information to minimize false positives, and actually understand your data better and better over time.

In the previous chapters, we’ve explored when and how automation with ML makes sense for your data quality monitoring strategy. Now it’s time to explore the core mechanism: how you can train, develop, and use a model to detect data quality issues—and even explain aspects like their severity and where they occur in your data.

In this chapter, we’ll explain which machine learning approach works best for data quality monitoring and show you the algorithm (series of steps) you can follow to implement this approach. We’ll answer questions like how much data you should sample, and how to make the model’s outputs explainable. It’s important to caveat that following the steps here won’t result in a model that’s ready to monitor real-world data. In Chapter 5, we’ll turn to the practical aspects of tuning and testing your system so that it functions reliably in an enterprise setting.

Requirements

There are many ML techniques you could potentially apply to a given problem. To figure out the right approach for your use case, it’s essential to define ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Driving Data Quality with Data Contracts

Driving Data Quality with Data Contracts

Andrew Jones
Data Governance: The Definitive Guide

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown
Data Quality Fundamentals

Data Quality Fundamentals

Barr Moses, Lior Gavish, Molly Vorwerck

Publisher Resources

ISBN: 9781098145927Errata PageSupplemental Content