Chapter 2. What Is Data Observability—and Why Do We Need It?
In your organization, has someone ever looked at a report and said the numbers were wrong? Likely this has happened more than once. No matter how advanced your data analytics and modeling tools are, if the data you’re ingesting, transforming, and flowing through your pipelines isn’t correct, the results won’t be reliable.
Even one new field added to a table by one team may cause another team that relies on that same data, but for a different use case, to have inconsistent data. If this issue isn’t quickly identified, the downstream impact could result in compliance risks, poor decision making, or lost revenue. If it happens too often, it will also result in a severe loss of confidence by business users.
However, discovering these kinds of data failures is much more challenging than typical application or system failures. In the case of an application, when something isn’t working, the symptoms are more evident. For instance, if an application crashes, freezes, or restarts without warning, you know you’ve got a problem. However, data issues are generally hard to notice since the data won’t freeze, crash, restart, or send any other signal that there’s a problem.
At the usage level, data also can’t just be “fixed.” You can only correct it by requesting the producer to publish the fixed data or by looking at the application(s) producing the data to determine what’s gone wrong. For instance, if a column or row is missing ...
Get What Is Data Observability? now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.