Chapter 10. Data Observability Design Patterns
The data quality design patterns from the previous chapter are crucial to guaranteeing the relevance of your datasets. However, as they focus mainly on the data itself, relying only on data quality solutions won’t be enough for you to have end-to-end control of your data engineering stack.
Let’s take a look at an example to understand this better. The Audit-Write-Audit-Publish (AWAP) pattern is a great protection mechanism against processing data of poor quality. Unfortunately, even if your AWAP job perfectly detects all issues, you may still be in trouble. An example of this occurs when your AWAP job doesn’t run because of an upstream flow interruption and you are not aware of it.
There is good news, though: the data observability design patterns from this chapter fill the gaps left by their data quality counterparts by adding monitoring and alerting capabilities to the system. To address these extra issues, the observability pattern solutions rely on two pillars: detection and tracking.
The detection design patterns spot any problems related to the data or time. They would be great candidates to handle the AWAP’s data flow interruption issue mentioned previously. They will also be useful for notifying you whenever your batch job takes too much time to complete.
Tracking design patterns focus on understanding the relationships among datasets, columns, and the data processing layer. They will help you discover the data generation ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access