Data Management at Scale

Book description

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption.

Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed.

  • Examine data management trends, including technological developments, regulatory requirements, and privacy concerns
  • Go deep into the Scaled Architecture and learn how the pieces fit together
  • Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Who Is This Book For?
    2. What Will I Learn?
    3. Navigating Through This Book
    4. Conventions Used in This Book
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  3. 1. The Disruption of Data Management
    1. Data Management
    2. Analytics Is Fragmenting the Data Landscape
    3. Speed of Software Delivery Is Changing
    4. Networks Are Getting Faster
    5. Privacy and Security Concerns Are a Top Priority
    6. Operational and Transactional Systems Need to Be Integrated
    7. Data Monetization Requires an Ecosystem-to-Ecosystem Architecture
    8. Enterprises Are Saddled with Outdated Data Architectures
      1. Enterprise Data Warehouse and Business Intelligence
      2. Data Lake
      3. Centralized View
    9. Summary
  4. 2. Introducing the Scaled Architecture: Organizing Data at Scale
    1. Universally Acknowledged Starting Points
      1. Each Application Has an Application Database
      2. Applications Are Specific and Have Unique Context
      3. Golden Source
      4. There’s No Escape from the Data Integration Dilemma
      5. Applications Play the Roles of Data Providers and Data Consumers
    2. Key Theoretical Considerations
      1. Object-Oriented Programming Principles
      2. Domain-Driven Design
      3. Business Architecture
    3. Communication and Integration Patterns
      1. Point-to-Point
      2. Silos
      3. Hub-Spoke Model
    4. Scaled Architecture
      1. Golden Sources and Domain Data Stores
      2. Data Delivery Contracts and Data Sharing Agreements
      3. Eliminating the Siloed Approach
      4. Domain-Driven Design on an Enterprise Scale
      5. Read-Optimized Data
      6. Data Layer as a Holistic Picture
      7. Metadata and the Target Operating Model
    5. Summary
  5. 3. Managing Vast Amounts of Data: The Read-Only Data Stores Architecture
    1. Introducing the RDS Architecture
    2. Command and Query Responsibility Segregation
      1. What Is CQRS?
      2. CQRS at Scale
    3. Read-Only Data Store Components and Services
      1. Metadata
      2. Data Quality
      3. RDS Tiers
      4. Data Ingestion
      5. Integrating Commercial Off-the-Shelf Solutions
      6. Extracting Data from External APIs and SaaSs
      7. Historical Data Service
      8. Design Variations
      9. Data Replication
      10. Access Layer
      11. File Manipulation Service
      12. Delivery Notification Service
      13. De-Identification Service
      14. Distributed Orchestration
    4. Intelligent Consumption Services
    5. Populating RDSs on Demand
    6. RDS Direct Usage Considerations
    7. Summary
  6. 4. Services and API Management: The API Architecture
    1. Introducing the API Architecture
    2. What Is Service-Oriented Architecture?
      1. Enterprise Application Integration
      2. Service Orchestration
      3. Service Choreography
      4. Public Services and Private Services
      5. Service Models and Canonical Data Models
      6. Similarities Between SOA and Enterprise Data Warehousing Architecture
    3. Modern View on SOA
      1. API Gateway
      2. Responsibility Model
      3. The New Role of the ESB
      4. Service Contracts
      5. Service Discovery
    4. Microservices
      1. The Role of the API Gateway Within Microservices
      2. Functions
      3. Service Mesh
      4. Microservices Boundaries
      5. Microservices Within the API Reference Architecture
    5. Ecosystem Communication
    6. API-Based Communication Channels
      1. GraphQL
      2. Backend for Frontend
    7. Metadata
    8. Using RDSs for Real-Time and Intensive Reads
    9. Summary
  7. 5. Event and Response Management: The Streaming Architecture
    1. Introducing the Streaming Architecture
    2. The Asynchronous Event Model Makes the Difference
    3. What Do Event-Driven Architectures Look Like?
      1. Mediator Topology
      2. Broker Topology
      3. Event Processing Styles
    4. A Gentle Introduction to Apache Kafka
      1. Distributed Event Data
      2. Apache Kafka Features
    5. The Streaming Architecture
      1. Event Producers
      2. Event Consumers
      3. Event Platform
      4. Event Sourcing and Command Sourcing
      5. Governance Model
      6. Business Streams
      7. Streaming Consumption Patterns
      8. Event-Carried State Transfer
      9. Playing the Role of an RDS
      10. Using Streaming to Populate RDSs
      11. Controls and Policies for Guiding the Domains
    6. Streaming as the Operational Backbone
    7. Guarantees and Consistency
      1. Consistency Level
      2. “At Least Once, Exactly Once, and at Most Once” Processing
      3. Message Order
      4. Dead Letter Queue
      5. Streaming Interoperability
    8. Metadata for Governance and Self-Service Models
    9. Summary
  8. 6. Connecting the Dots
    1. Recap of the Architectures
      1. RDS Architecture
      2. API Architecture
      3. Streaming Architecture
      4. Strengthening Patterns
    2. Enterprise Interoperability Standards
      1. Stable Data Endpoints
      2. Data Delivery Contracts
      3. Accessible and Addressable Data
      4. Crossing Network Principles
    3. Enterprise Data Standards
      1. Consumption-Optimization Principles
      2. Discoverability of Metadata
      3. Semantic Consistency
      4. Supplying the Corresponding Metadata
      5. Data Origination and Movements
    4. Reference Architecture
    5. Summary
  9. 7. Sustainable Data Governance and Data Security
    1. Data Governance
      1. Organization: Data Governance Roles
      2. Processes: Data Governance Activities
      3. People: Trust and Ethical, Social, and Economic Considerations
      4. Technology: Golden Source, Ownership, and Application Administration
      5. Data: Golden Sources, Golden Datasets, and Classifications
    2. Data Security
      1. Current Siloed Approach
      2. Unified Data Security for Architectures
      3. Identity Providers
      4. Security Reference Architecture and Data Context Approach
      5. Security Process Flow
    3. Practical Guidance
      1. RDS Architecture
      2. API Architecture
      3. Streaming Architecture
      4. Intelligent Learning Engine
    4. Summary
  10. 8. Turning Data into Value
    1. Consumption Patterns
      1. Using Read-Only Data Stores Directly
      2. Domain Data Stores
    2. Target Operating Model
    3. Data Professionals as a Target User Group
    4. Business Requirements
    5. Nonfunctional Requirements
    6. Building the Data Pipeline and Data Model
    7. Distributing Integrated Data
    8. Business Intelligence Capabilities
    9. Self-Service Capabilities
    10. Analytical Capabilities
      1. Standard Infrastructure for Automated Deployments
      2. Stateless Models
      3. Prescripted and Configured Workbenches
      4. Standardize on Model Integration Patterns
      5. Automation
      6. Model Metadata
    11. Advanced Analytics Reference Architecture
    12. Summary
  11. 9. Mastering Enterprise Data Assets
    1. Demystifying Master Data Management
    2. Master Data Management Styles
    3. MDM Reference Architecture
      1. Designing a Master Data Management Solution
      2. MDM Distribution
      3. Master Identification Numbers
      4. Reference Data Versus Master Data
    4. Determining the Scope of Your Enterprise Data
    5. MDM and Data Quality as a Service
    6. Curated Data
      1. Metadata Exchange
      2. Integrated Views
      3. Reusable Components and Integration Logic
      4. Data Republishing
    7. Relation to Data Governance
    8. Summary
  12. 10. Democratizing Data with Metadata
    1. Metadata Management
    2. Enterprise Metadata Model
    3. Enterprise Knowledge Graph
    4. Architectural Approaches for Metadata Management
      1. Metadata Interoperability
      2. Metadata Repositories
    5. Marketplace to Provide Rapid Access to Authorized Data
    6. Summary
  13. 11. Conclusion
    1. Delivery Model
      1. Fully Decentralized Approach
      2. Partially Decentralized Approach
      3. Structuring Teams
      4. InnerSource Strategy
    2. Culture
    3. Technology Choices
    4. The Decline of Traditional Enterprise Architecture
      1. Blueprints and Diagrams
      2. Modern Skills
      3. Control and Governance
    5. Last Words
  14. Glossary
  15. Index

Product information

  • Title: Data Management at Scale
  • Author(s): Piethein Strengholt
  • Release date: July 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492054788