O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Making Sense of Stream Processing

Book Description

How can event streams help make your application more scalable, reliable, and maintainable? In this report, O’Reilly author Martin Kleppmann shows you how stream processing can make your data storage and processing systems more flexible and less complex. Structuring data as a stream of events isn’t new, but with the advent of open source projects such as Apache Kafka and Apache Samza, stream processing is finally coming of age.

Using several case studies, Kleppmann explains how these projects can help you reorient your database architecture around streams and materialized views. The benefits of this approach include better data quality, faster queries through precomputed caches, and real-time user interfaces. Learn how to open up your data for richer analysis and make your applications more scalable and robust in the face of failures.

  • Understand stream processing fundamentals and their similarities to event sourcing, CQRS, and complex event processing
  • Learn how logs can make search indexes and caches easier to maintain
  • Explore the integration of databases with event streams, using the new Bottled Water open source tool
  • Turn your database architecture inside out by orienting it around streams and materialized views

Table of Contents

  1. Foreword
  2. Preface
  3. 1. Events and Stream Processing
    1. Implementing Google Analytics: A Case Study
      1. Aggregated Summaries
    2. Event Sourcing: From the DDD Community
    3. Bringing Together Event Sourcing and Stream Processing
      1. Twitter
      2. Facebook
      3. Immutable Facts and the Source of Truth
      4. Wikipedia
      5. LinkedIn
    4. Using Append-Only Streams of Immutable Events
    5. Tools: Putting Ideas into Practice
    6. CEP, Actors, Reactive, and More
  4. 2. Using Logs to Build a Solid Data Infrastructure
    1. Case Study: Web Application Developers Driven to Insanity
      1. Dual Writes
    2. Making Sure Data Ends Up in the Right Places
    3. The Ubiquitous Log
    4. How Logs Are Used in Practice
      1. 1) Database Storage Engines
      2. 2) Database Replication
      3. 3) Distributed Consensus
      4. 4) Kafka
    5. Solving the Data Integration Problem
    6. Transactions and Integrity Constraints
    7. Conclusion: Use Logs to Make Your Infrastructure Solid
    8. Further Reading
  5. 3. Integrating Databases and Kafka with Change Data Capture
    1. Introducing Change Data Capture
    2. Database = Log of Changes
    3. Implementing the Snapshot and the Change Stream
    4. Bottled Water: Change Data Capture with PostgreSQL and Kafka
      1. Why Kafka?
      2. Why Avro?
    5. The Logical Decoding Output Plug-In
      1. The Client Daemon
      2. Concurrency
    6. Status of Bottled Water
  6. 4. The Unix Philosophy of Distributed Data
    1. Simple Log Analysis with Unix Tools
    2. Pipes and Composability
    3. Unix Architecture versus Database Architecture
    4. Composability Requires a Uniform Interface
    5. Bringing the Unix Philosophy to the Twenty-First Century
  7. 5. Turning the Database Inside Out
    1. How Databases Are Used
      1. 1. Replication
      2. 2. Secondary Indexes
      3. 3. Caching
      4. 4. Materialized Views
      5. Summary: Four Database-Related Ideas
    2. Materialized Views: Self-Updating Caches
      1. Example: Implementing Twitter
      2. The Unbundled Database
    3. Streaming All the Way to the User Interface
    4. Conclusion