Mastering Kafka Streams and ksqlDB

Book description

Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide explores the world of real-time data systems through the lens of these popular technologies and explains important stream processing concepts against a backdrop of interesting business problems.

Mitch Seymour, senior data systems engineer at Mailchimp, introduces you to both Kafka Streams and ksqlDB so that you can choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing. In this book, you’ll learn:

  • Basic and advanced uses of Kafka Streams and ksqlDB
  • How to transform, enrich, and process event streams
  • How to build both stateless and stateful stream processing applications
  • The different notions of time and the role it plays in stream processing
  • How to to build event-driven microservices on top of continuous event streams
  • Features, operational characteristics, deployment patterns, and configuration tips for both technologies

Publisher resources

View/Submit Errata

Table of contents

  1. 1. Getting Started with Kafka Streams
    1. The Kafka Ecosystem
      1. Before Kafka Streams
      2. Enter Kafka Streams
    2. Features at a Glance
    3. Operational Characteristics
    4. Comparison to Other Systems
    5. Use Cases
    6. Processor Topologies
      1. Sub-topologies
      2. Depth-first processing
      3. Benefits of Dataflow Programming
      4. Tasks and Stream Threads
    7. High-level DSL vs Low-level Processor API
    8. Installation
    9. Tutorial: Hello Streams
      1. DSL
      2. Processor API
    10. Streams and Tables
      1. Stream / Table Duality
      2. KStream, KTable, GlobalKTable
    11. Development Workflows
      1. Pre-creating topics
      2. Empirical Testing
      3. Generating Data
    12. Summary
  2. 2. Stateless Processing
    1. Stateless vs Stateful Processing
    2. Introducing Our Tutorial: Processing a Twitter Stream
    3. Project Setup
    4. Adding a KStream Source Processor
    5. Serialization / Deserialization
      1. Building a Custom Serdes
      2. Defining Data Classes
      3. Implementing a Custom Deserializer
      4. Implementing a Custom Serializer
      5. Building the Tweet Serdes
    6. Filtering Data
    7. Branching Data
    8. Translating Tweets
    9. Merging Streams
    10. Enriching Tweets
      1. Avro Data Class
      2. Sentiment Analysis
    11. Serializing Avro Data
      1. Registryless Avro Serdes
      2. Schema Registry-aware Avro Serdes
    12. Adding a Sink Processor
    13. Running the Code
    14. Empirical Verification
    15. Summary
  3. 3. Stateful Processing
    1. Benefits of Stateful Processing
    2. Preview of Stateful Operators
    3. State Stores
      1. Common Characteristics
      2. Persistent vs In-Memory stores
    4. Tutorial: Video game Leaderboard
    5. Project Setup
    6. Data Models
    7. Adding the Source Processors
      1. KStream
      2. KTable
      3. GlobalKTable
    8. Registering Streams and Tables
    9. Joins
      1. Join Operators
      2. Types
      3. Co-partitioning
      4. Value joiners
      5. KStream to KTable Join
      6. KStream to GlobalKTable Join
    10. Grouping records
      1. Grouping Streams
      2. Grouping Tables
    11. Aggregations
      1. Aggregating Streams
      2. Aggregating Tables
    12. Putting it all together
    13. Interactive Queries
      1. Materialized Stores
      2. Accessing Read-only State Stores
      3. Querying Key-value Stores
      4. Local Queries
      5. Remote Queries
    14. Summary
  4. 4. Windows and Time
    1. Tutorial
    2. Project Setup
    3. Data Models
    4. Time Semantics
    5. Timestamp Extractors
      1. Custom Timestamp Extractors
      2. Registering Streams with a Timestamp Extractor
    6. Windowed Aggregations
      1. Window types
      2. Selecting a Window
      3. Windowed Aggregation
    7. Emitting Window Results
      1. Suppression
      2. Late Data
    8. Filtering and Rekeying Windowed KTables
    9. Windowed joins
    10. Time-driven Data Flow
      1. Alerts Sink
      2. Querying Windowed key-value stores
    11. Summary
  5. 5. Getting Started with ksqlDB
    1. What is ksqlDB?
    2. When to use ksqlDB
    3. Evolution of a New Kind of Database
      1. Kafka Streams Integration
      2. Connect Integration
      3. How does ksqlDB compare to a traditional SQL database?
      4. Similarities
      5. Differences
    4. Architecture
      1. ksqlDB Server
      2. ksqlDB Clients
    5. Deployment modes
      1. Interactive mode
      2. Headless mode
    6. Tutorial
      1. Installing ksqlDB
      2. Running a ksqlDB server
      3. Precreating Topics
      4. Using the ksqlDB CLI
      5. Summary
  6. 6. Data Integration with ksqlDB
    1. Kafka Connect Overview
    2. External vs Embedded Connect
      1. External Mode
      2. Embedded Mode
    3. Configuring Connect Workers
      1. Converters and Serialization Formats
    4. Tutorial
    5. Installing Connectors
      1. Creating Connectors with ksqlDB
      2. Showing Connectors
      3. Describing Connectors
      4. Dropping Connectors
    6. Verifying the Source Connector
    7. Interacting with the Kafka Connect Cluster Directly
    8. Introspecting Managed Schemas
    9. Summary

Product information

  • Title: Mastering Kafka Streams and ksqlDB
  • Author(s): Mitch Seymour
  • Release date: March 2021
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492062493