book

Fundamentals of Data Observability

Name: Fundamentals of Data Observability
Author: Andy Petrella
ISBN: 9781098133290

by Andy Petrella

August 2023

Beginner to intermediate

264 pages

7h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Overview of the BookWho Should Read This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Introducing Data Observability
1. Introducing Data Observability
Scaling Data TeamsChallenges of Scaling Data TeamsSegregated Roles and Responsibilities and Organizational ComplexityAnatomy of Data Issues and ConsequencesImpact of Data Issues on Data Team DynamicsScaling AI RoadblocksChallenges with Current Data Management PracticesEffects of Data Governance at ScaleData Observability to the RescueThe Areas of ObservabilityHow Data Teams Can Leverage Data Observability NowLow Latency Data Issues DetectionEfficient Data Issues TroubleshootingPreventing Data IssuesDecentralized Data Quality ManagementComplementing Existing Data Governance CapabilitiesThe Future and BeyondConclusion
2. Components of Data Observability
Channels of Data Observability InformationLogsTracesMetricsObservations ModelPhysical SpaceServerUserStatic SpaceDynamic SpaceExpectationsRulesAutomatic Anomaly DetectionPrevent Garbage In, Garbage OutConclusion
3. Roles of Data Observability in a Data Organization
Data ArchitectureWhere Does Data Observability Fit in a Data Architecture?Data Architecture with Data ObservabilityHow Data Observability Helps with Data Engineering UndercurrentsSecurityData ManagementSupport for Data Mesh’s Data as ProductsConclusion
II. Implementing Data Observability
4. Generate Data Observations
At the SourceGenerating Data Observations at the SourceLow-Level API in PythonDescription of the Data PipelineDefinition of the Status of the Data PipelineData Observations for the Data PipelineGenerate Contextual Data ObservationsGenerate Data-Related ObservationsGenerate Lineage-Related Data ObservationsWrap-Up: The Data-Observable Data PipelineUsing Data Observations to Address Failures of the Data PipelineConclusion
5. Automate the Generation of Data Observations
Abstraction StrategiesEvent ListenersAspect-Oriented ProgrammingHigh-Level ApplicationsNo-Code ApplicationsLow-Code ApplicationsDifferences Among Monitoring AlternativesConclusion
6. Implementing Expectations
Introducing ExpectationsShift-Left Data QualityCorner Cases DiscoveryLifting Service Level IndicatorsUsing Data ProfilersMaintaining ExpectationsOverarching PracticesFail Fast and Fail SafeSimplify Tests and Extend CI/CDConclusion
III. Data Observability in Action

7. Integrating Data Observability in Your Data Stack
Ingestion StageIngestion Stage Data Observability RecipesAirbyte AgentTransformationTransformation Stage Data Observability RecipesApache Sparkdbt AgentServingRecipesBigQuery in PythonOrchestrated SQL with AirflowAnalyticsMachine Learning RecipesBusiness Intelligence RecipesConclusion
8. Making Opaque Systems Translucent
Data TranslucenceOpaque SystemsSaaSDon’t Touch It; It (Kinda) WorksInherited SystemsStrategies for Data TranslucenceStrategiesThe Data Observability ConnectorExample: Building a dbt Data Observability Connector (SaaS)Conclusion
Afterword: Future Observations
Unification of ProcessingGenerative MilestonesTrustable Expanded CreativityConclusion
Index
About the Author

Overview

Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work.

Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need.

Learn the core principles and benefits of data observability
Use data observability to detect, troubleshoot, and prevent data issues
Follow the book's recipes to implement observability in your data projects
Use data observability to create a trustworthy communication framework with data consumers
Learn how to educate your peers about the benefits of data observability

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098133283Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills