Chapter 10. Pioneering the Future of Reliable Data Systems
If Data Quality Fundamentals taught you anything about the larger state of analytics and data engineering, it’s likely that data as an industry is going through a massive, irreversible sea change.
Only five years ago, it wasn’t uncommon for data to live in siloes, accessed only by functional teams on an ad hoc basis for discrete tasks such as understanding how internal systems were being used, for example, or perhaps querying data about application usage over time. Now, analytical data is turning into the modern business’s most critical and competitive form of currency. It’s no longer a matter of if your company relies on data, but how much and for what use cases.
Still, it’s simply not enough to collect more data; you also have to trust it. Solutions like cloud data warehouses and lakes, data catalogs, open source testing frameworks, and data observability solutions are building out additional features and functionalities to bring data reliability to the center of the conversation. Warehouses like Snowflake and Redshift make it easy to pull data quality metrics for freshness and volume, while open source tools like dbt and Great Expectations enable practitioners to quickly unit test their more critical data sets. Even catalogs like Alation and Collibra can provide some insight into data integrity and discovery at static points in time.
While these exciting new technologies have given data engineering teams more leverage ...
Get Data Quality Fundamentals now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.