Building an Event-Driven Data Mesh

Book description

The exponential growth of data combined with the need to derive real-time business value is a critical issue today. An event-driven data mesh can power real-time operational and analytical workloads, all from a single set of data product streams. With practical real-world examples, this book shows you how to successfully design and build an event-driven data mesh.

Building an Event-Driven Data Mesh provides:

  • Practical tips for iteratively building your own event-driven data mesh, including hurdles you'll experience, possible solutions, and how to obtain real value as soon as possible
  • Solutions to pitfalls you may encounter when moving your organization from monoliths to event-driven architectures
  • A clear understanding of how events relate to systems and other events in the same stream and across streams
  • A realistic look at event modeling options, such as fact, delta, and command type events, including how these choices will impact your data products
  • Best practices for handling events at scale, privacy, and regulatory compliance
  • Advice on asynchronous communication and handling eventual consistency

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. O’Reilly Online Learning
    3. How to Contact Us
    4. Acknowledgments
  2. 1. Event-Driven Data Communication
    1. What Is Data Mesh?
    2. An Event-Driven Data Mesh
    3. Using Data in the Operational Plane
      1. The Data Monolith
      2. The Difficulties of Communicating Data for Operational Concerns
      3. The Analytical Plane: Data Warehouses and Data Lakes
      4. The Organizational Impact of Schema on Read
      5. Bad Data: The Costs of Inaction
      6. Can We Unify Analytical and Operational Workflows?
    4. Rethinking Data with Data Mesh
    5. Common Objections to an Event-Driven Data Mesh
      1. Producers Cannot Model Data for Everyone’s Use Cases
      2. Making Multiple Copies of Data Is Bad
      3. Eventual Consistency Is Too Difficult to Manage
    6. Summary
  3. 2. Data Mesh
    1. Principle 1: Domain Ownership
      1. Domain-Driven Design in Brief
      2. Selecting the Data to Expose from Your Domain
    2. Principle 2: Data as a Product
      1. Data Products Provide Immutable and Time-Stamped Data
      2. Data Products Are Multimodal
      3. Accessing a Data Product Via Push or Pull
      4. The Three Data Product Alignment Types
      5. Event-Driven Data Products as Inputs for Operational Systems
    3. Principle 3: Federated Governance
      1. Specifying Data Product Language, Framework, and API Support
      2. Establishing Data Product Life Cycle Requirements
      3. Establishing Data Handling and Infosec Policies
      4. Identifying and Standardizing Cross-Domain Polysemes
      5. Formalizing Self-Service Platform Requirements
    4. Principle 4: Self-Service Platform
      1. Discovering Data Products and Dependencies
      2. Data Product Management Controls
      3. Data Product Access Controls
      4. Compute and Storage Resources for Building and Using Data Products
      5. Providing Self-Service Through SaaS
    5. Summary
  4. 3. Event Streams for Data Mesh
    1. Events, Messages, and Records
    2. What’s an Event Stream? What Is It Not?
      1. Ephemeral Message-Passing
      2. Queuing
    3. Consuming and Using Event-Driven Data Products
      1. State Events and Event-Carried State Transfer
      2. Materializing Events
      3. Aggregating Events
    4. The Kappa Architecture
    5. The Lambda Architecture and Why It Doesn’t Work for Data Mesh
    6. Supporting the Requirements for Kappa Architecture
    7. Selecting an Event Broker
    8. Summary
  5. 4. Federated Governance
    1. Forming a Federated Governance Team
    2. Implementing Standards
      1. Supporting Multimodal Data Product Types
      2. Supporting Data Product Schemas
      3. Supporting Programming Languages and Frameworks
      4. Metadata Standards and Requirements
    3. Ensuring Cross-Domain Data Product Compatibility and Interoperability
      1. Defining and Using Common Entities
      2. Event Stream Keying and Partitioning
      3. Time and Time Zones
    4. What Does a Governance Meeting Look Like?
      1. 1. Identifying Existing Problems
      2. 2. Drafting Proposals
      3. 3. Reviewing Proposals
      4. 4. Implementing Proposals
      5. 5. Archiving Proposals
    5. Data Security and Access Policies
      1. Disable Data Product Access by Default
      2. Consider End-to-End Encryption
      3. Field-Level Encryption
      4. Data Privacy, the Right to Be Forgotten, and Crypto-Shredding
    6. Data Product Lineage
      1. Topology-Based Lineage
      2. Record-Based Lineage
    7. Summary
  6. 5. Self-Service Data Platform
    1. The Self-Service Platform Maturity Model
    2. Level 1: The Minimal Viable Platform
      1. The Schema Registry
      2. An Extremely Basic Metadata Catalog
      3. Connectors
      4. Level 1 Wrap-Up: How Does It Work?
    3. Level 2: The Expanded Platform
      1. Full-Featured Metadata Catalog
      2. The Data Product Management Service and UI
      3. Service and User Identities
      4. Basic Access Controls
      5. Stream Processing for Building Data Products
      6. Level 2 Wrap-Up: How Does It Work?
    4. Level 3: The Mature Platform
      1. Authentication, Identification, and Access Management
      2. Integration with Existing Application Delivery Processes
      3. Programmatic Data Product Management API
      4. Monitoring and Alerting
      5. Multiregion and Multicloud Data Products
      6. Level 3 Wrap-Up: How Does It Work?
    5. Summary
  7. 6. Event Schemas
    1. A Brief Introduction to Serialization and Deserialization
    2. What Is a Schema?
    3. What Are Our Schema Technology Options?
      1. Google’s Protocol Buffers, aka Protobuf
      2. Apache Avro
      3. JSON Schema
    4. Schema Evolution: Changing Your Schemas Through Time
    5. Negotiating a Breaking Schema Change
      1. Step 1: Design the New Data Model
      2. Step 2: Iterate with Your Existing Consumers and the Federated Governance Team
      3. Step 3. Create a Release Schedule, a Data Migration Plan, and a Deprecation Plan
      4. Step 4. Execute the Release
    6. The Role of the Schema Registry
    7. Best Practices for Managing Schemas in Your Codebase
    8. Choosing a Schema Technology
    9. Summary
  8. 7. Designing Events
    1. Introduction to Event Types
    2. Expanding on State Events and Event-Carried State Transfer
      1. Current State Events
      2. Before/After State Events
    3. Delta Events
      1. Event Sourcing with Delta Events
      2. Why Delta Events Don’t Work for Event-Driven Data Products
    4. Measurement Events
      1. Measurement Events Often Form Aggregate-Aligned Data Products
      2. Measurement Event Sources May Be Lossy
      3. Measurement Events May Power Time-Sensitive Applications
    5. Hybrid Events—State with a Bit of Delta
    6. Notification Events
    7. Summary
  9. 8. Bootstrapping Data Products
    1. Getting Started: Bootstrapping with Connectors
    2. Dual Writes
    3. Polling the Database to Create Data Products
    4. Change-Data Capture
      1. Change-Data Capture Using a Transactional Outbox
    5. Denormalization and Eventification
      1. Eventification at the Transactional Outbox
      2. Eventification in a Dedicated Service
      3. What Should Go In the Event? And What Should Stay Out?
      4. Slowly Changing Dimensions
    6. Bootstrapping Cloud Storage Files to an Event Stream
    7. Summary
  10. 9. Integrating Event-Driven Data into Data at Rest
    1. Analytics and the Medallion Architecture
    2. Connecting Event Streams Into Existing Batch-Data Flows
      1. Through the Lens of Data Mesh: What’s Going On?
      2. Through the Lens of Data Mesh: How Do We Solve It?
      3. Balancing File Sizes, SLAs, and Latency
      4. Budget Blues: A Tale of Overspending
    3. Extending the Self-Service Platform for Nonstreaming Data Products
    4. Summary
  11. 10. Eventual Consistency
    1. Converging on Consistency, One Event at a Time
    2. Strategies for Dealing with Eventual Consistency
      1. Prevent Failures to Avoid Inconsistency
      2. Use Event-Driven Data Products Instead of Request-Response Server API Calls
      3. Expose Eventual Consistency in the Server Response
      4. Plan for New Services and Reprocessing of Data
      5. Synchronize Data Products on Time Boundaries
    3. Out-of-Order Events
    4. Resolving Late-Arriving Events
    5. Summary
  12. 11. Bringing It All Together
    1. Event Streams for Data Mesh
    2. Integrating with Existing Systems
    3. Operations, Analytics, and Everything in Between
    4. Summary
  13. Index
  14. About the Author

Product information

  • Title: Building an Event-Driven Data Mesh
  • Author(s): Adam Bellemare
  • Release date: April 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098127602