Chapter 2. Assembling the Building Blocks of a Reliable Data System
While solving data quality issues in production is a critical skill set for any data practitioner, data downtime can often be prevented almost entirely with the right systems and processes in place.
Like software, data can rely on any number of operational, programmatic, or even data-related influences at various stages in the pipeline, and all it takes is one schema change or code push to send a downstream report into disarray.
As weâll discuss in ChapterÂ 8, solving for data quality and building more reliable pipelines is broken into three key components: process, technologies, and people. In this chapter, weâll tackle the technology component of this equation, mapping together the disparate pieces of the data pipeline and what it takes to measure, fix, and prevent data downtime at each step.
Data systems are ridiculously complex, with various stages in the data pipeline contributing to this chaos. And as companies increasingly invest in data and analytics, the pressure to build at scale puts serious pressure on data engineers to account for quality before data even enters the pipeline.
In this chapter, weâll highlight the various metadata-powered building blocksâfrom data catalogs to data warehouses and lakesâto ensure your data infrastructure is set up for success when it comes to ensuring high-quality data at each stage of the pipeline.