Chapter 7The Data Quality and Data Governance Imperatives
In this chapter, we will examine the following:
- The risk of bad data leading to bad outcomes
- What data quality means for different data types
- Data quality tooling
- Data monitoring and observability
- Data governance is not an afterthought
I’ve been using the terms “data quality” and “data governance” throughout this book and they are an important part of the AI Readiness Model discussed in Chapter 3. I want to step back and introduce the concepts formally because they are important. Quality and trust in data are not just operational concerns—they are foundational for any AI initiative.
The Risk of Bad Data Leading to Bad Outcomes
Data quality refers to the overall suitability of a dataset to serve its intended purpose. It’s a measure of how well the data meet the requirements and expectations of its users, particularly in terms of accuracy, completeness, reliability, and relevance. Poor data quality produces inaccurate, incomplete, or inconsistent outputs, which can ultimately erode trust in results. For instance, a medical AI model that utilizes poor quality patient data could lead to incorrect diagnoses. If an online auto auction site used to price vehicles can’t recognize value signals such as make, model year, and trim package because it was trained on incomplete data, it may price the cars incorrectly for auction.
Here’s a real-world use case that I heard at a conference. The speaker described how a sales AI assistant ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access