Chapter 10. Measuring and Monitoring Pipeline Performance
Even the most well-designed data pipelines are not meant to be “set and forget.” The practice of measuring and monitoring the performance of pipelines is essential. You owe it to your team and stakeholders to set, and live up to, expectations when it comes to the reliability of your pipelines.
This chapter outlines some tips and best practices for doing something that data teams deliver to others but surprisingly don’t always invest in themselves: collecting data and measuring performance of their work.
Key Pipeline Metrics
Before you can determine what data you need to capture throughout your pipelines, you must first decide what metrics you want to track.
Choosing metrics should start with identifying what matters to you and your stakeholders. Some examples include the following:
-
How many validation tests (see Chapter 8) are run, and what percent of the total tests run pass
-
How frequently a specific DAG runs successfully
-
The total runtime of a pipeline over the course of weeks, months, and years
Get Data Pipelines Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.