Chapter 18. Quality Observability Service

So far, we have covered deployment of insights, and they’re now ready to be used in production. Consider a real-world example of a business dashboard deployed in production that is showing a spike in one of the metrics (such as gross new subscribers). Data users need to ensure that the spike is actually reflecting reality and not the result of a data quality problem. Several things can go wrong and lead to quality issues: uncoordinated source schema changes, changes in data element properties, ingestion issues, source and target systems with out-of-sync data, processing failures, incorrect business definitions for generating metrics, and so on.

Tracking quality in production pipelines is complex. First, there is no E2E unified and standardized tracking of data quality across multiple sources in the data pipeline. This results in a long delay in identifying and fixing data quality issues. Also, there is currently no standardized platform that requires teams to apply and manage their own hardware and software infrastructure to address the problem. Second, defining the quality checks and running them at scale requires a significant engineering effort. For instance, a personalization platform requires data quality validation of millions of records each day. Currently, data users rely on one-off checks that are not scalable with large volumes of data flowing across multiple systems. Third, it’s important not just to detect data quality issues, ...

Get The Self-Service Data Roadmap now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.