Preface
If you’ve experienced any of the following scenarios, raise your hand (or, you can just nod in solidarity—there’s no way we’ll know otherwise):
Five thousand rows in a critical (and relatively predictable) table suddenly turns into five hundred, with no rhyme or reason.
A broken dashboard causes an executive dashboard to spit null values.
A hidden schema change breaks a downstream pipeline.
And the list goes on.
This book is for everyone who has suffered from unreliable data, silently or with muffled screams, and wants to do something about it. We expect that these individuals will come from data engineers, data analytics, or data science backgrounds, and be actively involved in building, scaling, and managing their company’s data pipelines.
On the surface, it may seem like Data Quality Fundamentals is a manual about how to clean, wrangle, and generally make sense of data—and it is. But more so, this book tackles best practices, technologies, and processes around building more reliable data systems and, in the process, cultivating data trust with your team and stakeholders.
In Chapter 1, we’ll discuss why data quality deserves attention now, and how architectural and technological trends are contributing to an overall decrease in governance and reliability. We’ll introduce the concept of “data downtime,” and explain how it harkens back to the early days of site reliability engineering (SRE) teams and how these same DevOps principles can apply to your data engineering ...