Streaming Databases

Book description

Real-time applications are becoming the norm today. But building a model that works properly requires real-time data from the source, in-flight stream processing, and low latency serving of its analytics. With this practical book, data engineers, data architects, and data analysts will learn how to use streaming databases to build real-time solutions.

Authors Hubert Dulay and Ralph M. Debusmann take you through streaming database fundamentals, including how these databases reduce infrastructure for real-time solutions. You'll learn the difference between streaming databases, stream processing, and real-time online analytical processing (OLAP) databases. And you'll discover when to use push queries versus pull queries, and how to serve synchronous and asynchronous data emanating from streaming databases.

This guide helps you:

  • Explore stream processing and streaming databases
  • Learn how to build a real-time solution with a streaming database
  • Understand how to construct materialized views from any number of streams
  • Learn how to serve synchronous and asynchronous data
  • Get started building low-complexity streaming solutions with minimal setup

Publisher resources

View/Submit Errata

Table of contents

  1. Brief Table of Contents (Not Yet Final)
  2. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Hubert’s Acknowledgements
    6. Ralph’s Acknowledgments
  3. 1. Streaming Foundations
    1. Turning The Database Inside-Out
    2. Externalizing Database Features
      1. Write Ahead Log
      2. Streaming Platforms
      3. Materialized Views
    3. Simple Use Case
      1. Understanding Transactions and Events
      2. Domain Driven Design
    4. Context Enrichment
    5. Change Data Capture (CDC)
    6. Connectors
    7. Summary
  4. 2. Stream Processing Platforms
    1. Stateful Transformations
    2. Data Pipelines
      1. ELT Limitations
      2. Stream processing with ELT
    3. Stream Processors
    4. Apache Spark Limitation
    5. Two Types of Streams
      1. Append Stream
      2. Debezium Change Data
      3. Materialized Views
    6. Summary
  5. 3. Serving Real-time Data
    1. Real-Time Expectations
    2. Service Level Agreements
    3. Choosing An Analytical Data Store
    4. Sourcing from a Topic
    5. Ingestion Transformations
    6. OLTP vs. OLAP
      1. ACID
      2. Row vs. Columnar Base Optimization
    7. Queries Per Second (QPS) and Concurrency
    8. Indexing
      1. StarTree Index
    9. Serving Analytical Results
      1. Synchronous Queries
      2. Asynchronous Queries
      3. Push vs Pull Queries
    10. Summary
  6. 4. Materialized Views
    1. Views, Materialized Views, and Incremental Updates
    2. Change Data Capture
    3. Push vs Pull Queries
    4. CDC and UPSERT
    5. Joining Streams
      1. Apache Calcite
      2. Click Stream Use Case
    6. Summary
  7. 5. Introduction to Streaming Database
    1. Identifying The Streaming Database
      1. Column-Based Streaming Database
      2. Row-Based Streaming Database
      3. Edge Streaming-Like Databases
    2. SQL Expressivity
    3. Streaming Debuggability
      1. Advantages of Debugging in Streaming Databases
      2. SQL Is Not a Silver Bullet
    4. Streaming Database Implementations
    5. Streaming Database Architecture
    6. ELT with Streaming Databases
    7. Summary
  8. 6. Consistency
    1. A Toy Example
      1. Transactions
      2. Analyzing the Transactions
    2. Comparing Consistency across Stream Processing Systems
      1. Flink SQL
      2. ksqlDB
      3. Proton (TimePlus)
      4. RisingWave
      5. Materialize
      6. Pathway
      7. Late-Arriving and Out-of-Order Messages
    3. Going Beyond Eventual Consistency
      1. Why Do Eventually Consistent Stream Processors Fail the Toy Example?
      2. How Do Internally Consistent Stream Processing Systems Pass the Toy Example?
      3. How Can We Fix Eventually Consistent Stream Processing Systems to Pass the Toy Example?
    4. Consistency vs. Latency
    5. Summary
  9. 7. Emergence of Other Hybrid Data Systems
    1. Data Planes
    2. Hybrid Transactional and Analytical Database (HTAP)
    3. Other Hybrid Databases
    4. Motivations for Hybrid Systems
    5. The Influence of PostgreSQL on Hybrid Databases
    6. Near Edge Analytics
    7. Next Generation (NG) Hybrid Databases
      1. Next Generation Streaming OLTP Database
      2. Next-Generation Streaming Real-Time OLAP Databases
      3. Next-Generation HTAP Database
      4. Next-Generation Real-Time Databases
    8. Summary
  10. 8. Zero ETL or Near Zero ETL
    1. ETL Model
    2. Zero ETL
    3. Near-Zero ETL
      1. PeerDB
      2. Proton
      3. Embedded OLAP
      4. Data Gravity and Replication
      5. Analytical Data Reduction
    4. Lambda Architecture
      1. Apache Pinot Hybrid Tables
      2. Pipeline Configurations
    5. Summary
  11. 9. The Streaming Plane
    1. Data Gravity
    2. Components of the Streaming Plane
    3. Streaming Plane Infrastructure
    4. Operational Analytics
    5. Data Mesh
      1. Pillars of a Data Mesh
      2. Challenge of a Data Mesh
    6. Streaming Data Mesh with Streaming Plane and Streaming Databases
      1. Data Locality
      2. Data Replication
    7. Summary
  12. 10. Deployment Models
    1. Consistent Streaming Database
    2. Consistent Streaming Processor and Real-time OLAP
    3. Eventually Consistent OLAP Streaming Database
    4. Eventually Consistent Stream Processor and Real-Time OLAP
    5. Eventually Consistent Stream Processor and HTAP
    6. ksqlDB
    7. Incremental View Maintenance
    8. Postgres Multicorn Foreign Data Wrapper
      1. When to Use Code-based Stream Processors
      2. When to Use Lakehouse/Streamhouse Technologies
      3. Caching Technologies
    9. Where to Do Processing and Querying in General?
      1. The Four “Where"-Questions
      2. An Analytical use case
      3. Consequences
    10. Summary
  13. 11. Future State of Real-Time Data
    1. The Convergence of the Data Planes
    2. Graph Databases
      1. Memgraph
      2. ThatDot/Quine
    3. Vector Databases
      1. Milvus 2.x - Streaming as the Central Backbone
      2. Real-time OLAP Databases - Adding Vector Search
    4. Incremental View Maintenance
      1. PG_IVM
      2. Hydra
      3. Epsio
      4. Feldera
      5. PeerDB
    5. Data Wrapping and Postgres Multicorn
    6. Classical Databases
    7. Data Warehouses
      1. BigQuery
      2. Redshift
      3. Snowflake
    8. Lakehouse
      1. Delta Lake
      2. Apache Paimon
      3. Apache Iceberg
      4. Apache Hudi
      5. OneTable
      6. The Relationship of Streaming and Lakehouses
    9. Conclusion

Product information

  • Title: Streaming Databases
  • Author(s): Hubert Dulay, Ralph Matthias Debusmann
  • Release date: August 2024
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098154837