9 Data quality

This chapter covers

  • Testing data to ensure quality
  • Different types of data quality checks
  • Executing data tests
  • Considerations for scaling out data testing

The insights generated by a data platform are only as good as the quality of the underlying data. A good data platform needs to provide some guarantees around data quality. In this chapter, we will focus on data quality.

At the time of writing, data quality testing isn’t yet offered “as a service” by all major cloud providers. Unlike some of the previous topics we’ve covered in this book—such as storage, data processing, or machine learning (ML)—we don’t have an out-of-the-box PaaS (platform as a service) solution, so we’ll have to stitch something together ourselves.

We’ll ...

Get Data Engineering on Azure now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.