Chapter 2. Data Quality Monitoring Strategies and the Role of Automation

There are many different ways you can approach data quality monitoring. Before evaluating the options, it helps to think about what success looks like. In this chapter, we’ll define the requirements for success. Then we’ll walk through the traditional strategies—manual checks, rule-based testing, and metrics monitoring—and see how they measure up.

Next, we’ll explore the idea of automating data quality monitoring. We’ll explain how unsupervised machine learning can help us satisfy some missing aspects of our success criteria, scaling monitoring to large amounts of data while reducing alert fatigue.

We’ll wrap up by introducing the data quality monitoring strategy we advocate for in this book: a four-pillar approach combining data observability, rule-based testing, metrics monitoring, and unsupervised machine learning. As we’ll show, this approach has many advantages. It allows subject matter experts to enforce essential ...

Get Automating Data Quality Monitoring at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.