Getting DataOps Right

Book description

Many large organizations have accumulated dozens of disconnected data sources to serve different lines of business over the years. These applications might be useful to one area of the enterprise, but they’re usually inaccessible to other data consumers in the organization. In this short report, five data industry thought leaders explore DataOps—the automated, process-oriented methodology for making clean, reliable data available to teams throughout your company.

Andy Palmer, Michael Stonebraker, Nik Bates-Haus, Liam Cleary, and Mark Marinelli from Tamr use real-world examples to explain how DataOps works. DataOps is as much about changing people’s relationship to data as it is about technology, infrastructure, and process. This report provides an organizational approach to implementing this discipline in your company—including various behavioral, process, and technology changes.

Through individual essays, you’ll learn how to:

  • Move toward scalable data unification (Michael Stonebraker)
  • Understand DataOps as a discipline (Nik Bates-Haus)
  • Explore the key principles of a DataOps ecosystem (Andy Palmer)
  • Learn the key components of a DataOps ecosystem (Andy Palmer)
  • Build a DataOps toolkit (Liam Cleary)
  • Build a team and prepare for future trends (Mark Marinelli)

Table of contents

  1. 1. Introduction
    1. DevOps and DataOps
    2. The Catalyst for DataOps: “Data Debt”
    3. Paying Down the Data Debt
    4. From Data Debt to Data Asset
    5. DataOps to Drive Repeatability and Value
    6. Organizing by Logical Entity
  2. 2. Moving Toward Scalable Data Unification
    1. A Brief History of Data Unification Systems
    2. Unifying Data
      1. Rules for Scalable Data Unification
  3. 3. DataOps as a Discipline
    1. DataOps: Building Upon Agile
      1. The Agile Manifesto
      2. Agile Practices
    2. Agile Operations for Data and Software
      1. DataOps Tenets
      2. DataOps Practices
    3. DataOps Challenges
      1. Application Data Interface
      2. Data Processing Architecture
      3. Query Interface
      4. Resource Intensive
      5. Schema Change
      6. Governance
    4. The Agile Data Organization
  4. 4. Key Principles of a DataOps Ecosystem
    1. Highly Automated
    2. Open
    3. Best of Breed
    4. Table(s) In/Table(s) Out Protocol
      1. Three Core Styles of Interfaces for Components
    5. Tracking Data Lineage and Provenance
      1. Data Integration: Deterministic, Probabilistic, and Humanistic
      2. Combining Aggregated and Federated Storage
      3. Processing Data in Both Batch and Streaming Modes
    6. Conclusion
  5. 5. Key Components of a DataOps Ecosystem
    1. Catalog/Registry
    2. Movement/ETL
    3. Alignment/Unification
    4. Storage
    5. Publishing
    6. Feedback
    7. Governance
  6. 6. Building a DataOps Toolkit
    1. Interoperability
      1. Composable Agile Units
      2. Results Import
      3. Metadata Exchange
    2. Automation
      1. Continuous Automation
      2. Batch Automation
  7. 7. Embracing DataOps: How to Build a Team and Prepare for Future Trends
    1. Building a DataOps Team
      1. Data Supply
      2. Data Preparation
      3. Data Consumption
      4. So Where Are They?
    2. The Future of DataOps
      1. The Need for Smart, Automated Data Analysis
      2. Custom Solutions from Purpose-Built Components
      3. Increased Approachability of Advanced Tools
      4. Subject Matter Experts Will Become Data Curators and Stewards
    3. A Final Word

Product information

  • Title: Getting DataOps Right
  • Author(s): Andy Palmer, Michael Stonebraker, Nik Bates-Haus, Liam Cleary, Mark Marinelli
  • Release date: July 2019
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492031758