Data Management at Scale, 2nd Edition

Book description

As data management continues to evolve rapidly, managing all of your data in a central place, such as a data warehouse, is no longer scalable. Today's world is about quickly turning data into value. This requires a paradigm shift in the way we federate responsibilities, manage data, and make it available to others. With this practical book, you'll learn how to design a next-gen data architecture that takes into account the scale you need for your organization.

Executives, architects and engineers, analytics teams, and compliance and governance staff will learn how to build a next-gen data landscape. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed.

  • Examine data management trends, including regulatory requirements, privacy concerns, and new developments such as data mesh and data fabric
  • Go deep into building a modern data architecture, including cloud data landing zones, domain-driven design, data product design, and more
  • Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Why I Wrote This Book and Why Now
      1. Who Is This Book For?
    2. How to Read or Use This Book
    3. Conventions Used in This Book
    4. O’Reilly Online Learning
    5. How to Contact Us
    6. Acknowledgments
  3. 1. The Journey to Becoming Data-Driven
    1. Recent Technology Developments and Industry Trends
    2. Data Management
    3. Analytics Is Fragmenting the Data Landscape
    4. The Speed of Software Delivery Is Changing
    5. The Cloud’s Impact on Data Management Is Immeasurable
    6. Privacy and Security Concerns Are a Top Priority
    7. Operational and Analytical Systems Need to Be Integrated
    8. Organizations Operate in Collaborative Ecosystems
    9. Enterprises Are Saddled with Outdated Data Architectures
      1. The Enterprise Data Warehouse: A Single Source of Truth
      2. The Data Lake: A Centralized Repository for Structured and Unstructured Data
      3. The Pain of Centralization
    10. Defining a Data Strategy
    11. Wrapping Up
  4. 2. Organizing Data Using Data Domains
    1. Application Design Starting Points
      1. Each Application Has a Data Store
      2. Applications Are Always Unique
      3. Golden Sources
      4. The Data Integration Dilemma
      5. Application Roles
    2. Inspirations from Software Architecture
    3. Data Domains
      1. Domain-Driven Design
      2. Business Architecture
      3. Domain Characteristics
    4. Principles for Distributed and Domain-Oriented Data Management
      1. Design Principles for Data Domains
      2. Best Practices for Data Providers
      3. Domain Ownership Responsibilities
    5. Transitioning Toward Distributed and Domain-Oriented Data Management
    6. Wrapping Up
  5. 3. Mapping Domains to a Technology Architecture
    1. Domain Topologies: Managing Problem Spaces
      1. Fully Federated Domain Topology
      2. Governed Domain Topology
      3. Partially Federated Domain Topology
      4. Value Chain–Aligned Domain Topology
      5. Coarse-Grained Domain Topology
      6. Coarse-Grained and Partially Governed Domain Topology
      7. Centralized Domain Topology
      8. Picking the Right Topology
    2. Landing Zone Topologies: Managing Solution Spaces
      1. Single Data Landing Zone
      2. Source- and Consumer-Aligned Landing Zones
      3. Hub Data Landing Zone
      4. Multiple Data Landing Zones
      5. Multiple Data Management Landing Zones
      6. Practical Landing Zones Example
    3. Wrapping Up
  6. 4. Data Product Management
    1. What Are Data Products?
      1. Problems with Combining Code, Data, Metadata, and Infrastructure
      2. Data Products as Logical Entities
    2. Data Product Design Patterns
      1. What Is CQRS?
      2. Read Replicas as Data Products
    3. Design Principles for Data Products
      1. Resource-Oriented Read-Optimized Design
      2. Data Product Data Is Immutable
      3. Using the Ubiquitous Language
      4. Capture Directly from the Source
      5. Clear Interoperability Standards
      6. No Raw Data
      7. Don’t Conform to Consumers
      8. Missing Values, Defaults, and Data Types
      9. Semantic Consistency
      10. Atomicity
      11. Compatibility
      12. Abstract Volatile Reference Data
      13. New Data Means New Ownership
      14. Data Security Patterns
      15. Establish a Metamodel
      16. Allow Self-Service
      17. Cross-Domain Relationships
      18. Enterprise Consistency
      19. Historization, Redeliveries, and Overwrites
      20. Business Capabilities with Multiple Owners
      21. Operating Model
    4. Data Product Architecture
      1. High-Level Platform Design
      2. Capabilities for Capturing and Onboarding Data
      3. Data Quality
      4. Data Historization
    5. Solution Design
      1. Real-World Example
      2. Alignment with Storage Accounts
      3. Alignment with Data Pipelines
      4. Capabilities for Serving Data
      5. Data Serving Services
      6. File Manipulation Service
      7. De-Identification Service
      8. Distributed Orchestration
      9. Intelligent Consumption Services
      10. Direct Usage Considerations
    6. Getting Started
    7. Wrapping Up
  7. 5. Services and API Management
    1. Introducing API Management
    2. What Is Service-Oriented Architecture?
      1. Enterprise Application Integration
      2. Service Orchestration
      3. Service Choreography
      4. Public Services and Private Services
      5. Service Models and Canonical Data Models
      6. Parallels with Enterprise Data Warehousing Architecture
    3. A Modern View of API Management
      1. Federated Responsibility Model
      2. API Gateway
      3. API as a Product
      4. Composite Services
      5. API Contracts
      6. API Discoverability
    4. Microservices
      1. Functions
      2. Service Mesh
      3. Microservice Domain Boundaries
    5. Ecosystem Communication
    6. Experience APIs
      1. GraphQL
      2. Backend for Frontend
    7. Practical Example
    8. Metadata Management
    9. Read-Oriented APIs Serving Data Products
    10. Wrapping Up
  8. 6. Event and Notification Management
    1. Introduction to Events
      1. Notifications Versus Carried State
      2. The Asynchronous Communication Model
    2. What Do Modern Event-Driven Architectures Look Like?
      1. Message Queues
      2. Event Brokers
      3. Event Processing Styles
      4. Event Producers
      5. Event Consumers
      6. Event Streaming Platforms
      7. Governance Model
      8. Event Stores as Data Product Stores
      9. Event Stores as Application Backends
    3. Streaming as the Operational Backbone
    4. Guarantees and Consistency
      1. Consistency Level
      2. Processing Methods
      3. Message Order
      4. Dead Letter Queue
      5. Streaming Interoperability
    5. Governance and Self-Service
    6. Wrapping Up
  9. 7. Connecting the Dots
    1. Cross-Domain Interoperability
      1. Quick Recap
      2. Data Distribution Versus Application Integration
      3. Data Distribution Patterns
      4. Application Integration Patterns
      5. Consistency and Discoverability
    2. Inspiring, Motivating, and Guiding for Change
      1. Setting Domain Boundaries
      2. Exception Handling
    3. Organizational Transformation
      1. Team Topologies
      2. Organizational Planning
    4. Wrapping Up
  10. 8. Data Governance and Data Security
    1. Data Governance
      1. The Governance Framework
      2. Processes: Data Governance Activities
      3. Making Governance Effective and Pragmatic
      4. Supporting Services for Data Governance
      5. Data Contracts
    2. Data Security
      1. Current Siloed Approach
      2. Trust Boundaries
      3. Data Classifications and Labels
      4. Data Usage Classifications
      5. Unified Data Security
      6. Identity Providers
      7. Real-World Example
      8. Typical Security Process Flow
      9. Securing API-Based Architectures
      10. Securing Event-Driven Architectures
    3. Wrapping Up
  11. 9. Democratizing Data with Metadata
    1. Metadata Management
    2. The Enterprise Metadata Model
      1. Practical Example of a Metamodel
      2. Data Domains and Data Products
      3. Data Models
      4. Data Lineage
      5. Other Metadata Areas
    3. The Metalake Architecture
      1. Role of the Catalog
      2. Role of the Knowledge Graph
    4. Wrapping Up
  12. 10. Modern Master Data Management
    1. Master Data Management Styles
    2. Data Integration
    3. Designing a Master Data Management Solution
    4. Domain-Oriented Master Data Management
      1. Reference Data
      2. Master Data
      3. MDM and Data Quality as a Service
    5. MDM and Data Curation
      1. Knowledge Exchange
      2. Integrated Views
      3. Reusable Components and Integration Logic
      4. Republishing Data Through Integration Hubs
      5. Republishing Data Through Aggregates
    6. Data Governance Recommendations
    7. Wrapping Up
  13. 11. Turning Data into Value
    1. The Challenges of Turning Data into Value
    2. Domain Data Stores
      1. Granularity of Consumer-Aligned Use Cases
      2. DDSs Versus Data Products
    3. Best Practices
      1. Business Requirements
      2. Target Audience and Operating Model
      3. Nonfunctional Requirements
      4. Data Pipelines and Data Models
      5. Scoping the Role Your DDSs Play
    4. Business Intelligence
      1. Semantic Layers
      2. Self-Service Tools and Data
      3. Best Practices
    5. Advanced Analytics (MLOps)
      1. Initiating a Project
      2. Experimentation and Tracking
      3. Data Engineering
      4. Model Operationalization
      5. Exceptions
    6. Wrapping Up
  14. 12. Putting Theory into Practice
    1. A Brief Reflection on Your Data Journey
    2. Centralized or Decentralized?
    3. Making It Real
      1. Opportunistic Phase: Set Strategic Direction
      2. Transformation Phase: Lay Out the Foundation
      3. Optimization Phase: Professionalize Your Capabilities
    4. Data-Driven Culture
      1. DataOps
      2. Governance and Literacy
    5. The Role of Enterprise Architects
      1. Blueprints and Diagrams
      2. Modern Skills
      3. Control and Governance
    6. Last Words
  15. Index
  16. About the Author

Product information

  • Title: Data Management at Scale, 2nd Edition
  • Author(s): Piethein Strengholt
  • Release date: April 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098138868