Skip to Content
Automating Data Quality Monitoring
book

Automating Data Quality Monitoring

by Jeremy Stanley, Paige Schwartz
January 2024
Intermediate to advanced
220 pages
6h 3m
English
O'Reilly Media, Inc.
Content preview from Automating Data Quality Monitoring

Chapter 5. Building a Model That Works on Real-World Data

In Chapter 4, we shared an algorithm for data quality monitoring with unsupervised machine learning. It’s one thing to read about these steps, and quite another to build a model that performs well in practice on any arbitrary real-world dataset. If you don’t have strategies to account for nuances like seasonality, time-based features, and correlations across columns, your model will over- or under-alert, often dramatically.

Beyond knowing the pitfalls to look out for, you’ll need to continuously evaluate your model against benchmark data to figure out where and how to improve. We’ll share methods for effective model testing, including thoughts on developing a library to introduce chaos into perfectly well-behaved data (cue evil laugh).

Data Challenges and Mitigations

To make your model truly valuable rather than noisy, you’ll need strategies to overcome the challenges presented by data in the wild.

Seasonality

Humans are very seasonal creatures. We change our behavior patterns by hour of the day and day of the week. We pay bills on roughly the same day every month and go on holiday around the same time every year. Most data, in some way or another, is a reflection of human behavior or is affected by human behavior, and so these seasonality patterns appear in almost all data we care about.

As you’ll recall from Chapter 4, our approach relies on comparing data from today to data from yesterday. But because of seasonality, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Driving Data Quality with Data Contracts

Driving Data Quality with Data Contracts

Andrew Jones
Data Governance: The Definitive Guide

Data Governance: The Definitive Guide

Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown
Data Quality Fundamentals

Data Quality Fundamentals

Barr Moses, Lior Gavish, Molly Vorwerck

Publisher Resources

ISBN: 9781098145927Errata PageSupplemental Content