4

Getting to Know Your Data

“Truth, like gold, is to be obtained not by its growth, but by washing away from it all that is not gold.”

―Leo Tolstoy

In this chapter, we explore features within the Databricks DI Platform that help improve and monitor data quality and facilitate data exploration. There are numerous approaches to getting to know your data better with Databricks. First, we cover how to oversee data quality with Delta Live Tables (DLT) to catch quality issues early and prevent the contamination of entire pipelines. We’ll take our first close look at Lakehouse Monitoring, which helps us analyze data changes over time and can alert us to changes that concern us. Lakehouse Monitoring is a big time-saver, allowing you to focus on mitigating ...

Get Databricks ML in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.