CHAPTER 8Data Quality
So far, we’ve discussed the fundamentals of databases, data storage systems, and data pipelines. While these are essential components of the data engineering process, even the best designs or storage solutions are meaningless if the data delivered to downstream users is bad. So, how do we ensure quality data in our data engineering process?
IN THIS CHAPTER, YOU WILL LEARN ABOUT THE FOLLOWING:
- Causes of bad data and the impact
- Understanding what data quality means and the importance
- Various data quality dimensions
- How to identify data quality issues
- Common data quality checks
- Your role in ensuring data quality in your organization using best practices
Data quality is an important aspect of the data engineering process, but let's step back and look at the broader picture. Data engineering serves multiple business needs in an organization across various departments. To truly understand the impact of bad data or why we need to ensure data quality, we can examine how it affects the overall business performance and decision-making process.
The truth is a lot of organizations don’t care about the quality of their data. Although quality data is an important part of their process, it is still overlooked. Most organizations want to start initiatives that can impact their business positively, but since they have bad data, they rather build walk-arounds or create makeshift solutions.
In this era of innovation, poor data quality is a major roadblock for businesses ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access