Chapter 9. Data Quality in the Real World: Conversations and Case Studies

It’s great to talk about data quality in theory, but what does this desired state actually look like in practice?

Over the past several chapters, we’ve walked through what it takes to achieve data reliability at scale, from how to design a DataOps workflow to common SQL tests to determine the volume and freshness of your data assets. We’ve sprinkled in a dose of real-world case studies, but as we all know, data quality isn’t achieved in a textbook, and getting to “reliable data” depends on several other elements of your data analytics and engineering practice. As technologies advance and companies become more data-reliant, we need to consider how other industry-defining processes and technologies affect our ability to increase data reliability.

In this chapter, we’ll discuss five topics that are top of mind for many of today’s data leaders and share how data quality plays a critical part:

  • The data mesh and where data quality fits in

  • Data quality’s role in the cloud-based data stack journey

  • Knowledge graphs as the key to more accessible data

  • Data discovery for distributed data architectures

  • When to get started with data quality

Over the past several years, these five topics, technologies, and trends have become increasingly common, often giving organizations the advantage necessary to tackle data reliability in a more scalable and repeatable way. Let’s dive in.

Building a Data Mesh ...

Get Data Quality Fundamentals now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.