Data Contracts

Book description

Poor data quality can cause major problems for data teams, from breaking revenue-generating data pipelines to losing the trust of data consumers. Despite the importance of data quality, many data teams still struggle to avoid these issues—especially when their data is sourced from upstream workflows outside of their control. The solution: data contracts. Data contracts enable high-quality, well-governed data assets by documenting expectations of the data, establishing ownership of data assets, and then automatically enforcing these constraints within the CI/CD workflow.

This practical book introduces data contract architecture with a clear definition of data contracts, explains why the data industry needs them, and shares real-world use cases of data contracts in production. In addition, you'll learn how to implement components of the data contract architecture and understand how they're used in the data lifecycle. Finally, you'll build a case for implementing data contracts in your organization.

Authors Chad Sanderson and Mark Freeman will help you:

  • Explore real-world applications of data contracts within the industry
  • Understand how to apply each component of this architecture, such as CI/CD, monitoring, version control data, and more
  • Learn how to implement data contracts using open source tools
  • Examine ways to resolve data quality issues using data contract architecture
  • Measure the impact of implementing a data contract in your organization
  • Develop a strategy to determine how data contracts will be used in your organization

Publisher resources

View/Submit Errata

Table of contents

  1. Brief Table of Contents (Not Yet Final)
  2. 1. Why the Industry Now Needs Data Contracts
    1. Garbage-In Garbage-Out Cycle
      1. Modern Data Management
      2. What is data debt?
      3. Garbage In / Garbage Out
    2. The Death of Data Warehouses
      1. The Pre-Modern Era
      2. Software Eats the World
      3. A Move Towards Microservices
      4. Data Architecture in Disrepair
    3. Rise of the Modern Data Stack
      1. The Big Players
      2. Rapid Growth
      3. Problems in Paradise
    4. The Shift to Data-centric AI
      1. Diminishing ROI of Improving ML Models
      2. Commoditization of Data Science Workflows
      3. Data’s Rise Over ML in Creating a Competitive Advantage
    5. Conclusion
    6. Additional Resources
    7. References
  3. 2. Data Quality Isn’t About Pristine Data
    1. Defining Data Quality
    2. OLTP Versus OLAP and Its Implications for Data Quality
      1. A Brief Summary of OLTP and OLAP
      2. Translation Issues Between OLTP and OLAP Data Worldviews
    3. The Cost of Poor Data Quality
      1. Measuring Data Quality
      2. Who Is Impacted
    4. Conclusion
    5. Additional Resources
    6. References
  4. 3. The Challenges of Scaling Data Infrastructure
    1. How Data Development Is Not Like Software Development
      1. How Software Engineers Build Products
      2. How Data Developers Build Products
    2. Core Challenges for Modern Data Engineering Teams
    3. Why Data Development Needs a Design Surface
      1. Prevention first
      2. Communicative
      3. Contextual
      4. At the right time
      5. Including the right people
    4. The Cost of Large-Scale Refactors
      1. Large-Scale Refactor Considerations
      2. Use Case: Alan’s Large-Scale Refactor
    5. The Dangers of Database Migrations
      1. Data Loss
      2. Introduction of Data Quality Issues
      3. Massive Amounts of Change Management
      4. Staff Pulled Away From Main Roles
      5. Untangling Business Logic is Painful
      6. Data debt pain
      7. Changing business models :
      8. Regulatory changes:
      9. Skyrocketing cloud costs :
      10. Opportunities with new technologies :
    6. The Role of Change Management in Data Quality
      1. The Entropic Behavior of Data
      2. How Data Drifts from Established Business Logic
      3. Change Management Needs to Align with the Needs of the Business for It to Be Accepted
    7. How Infrastructure Needs Change at Scale
      1. Dunbar’s Number & Conway’s Law
      2. Case Study: Atlassian Engineering Team
      3. How Data Contracts Enable Change Management at Scale
    8. Conclusion
    9. Additional Resources
    10. References
  5. About the Authors

Product information

  • Title: Data Contracts
  • Author(s): Chad Sanderson, Mark Freeman
  • Release date: March 2025
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098157630