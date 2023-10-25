Building Real-Time Analytics Systems

Building Real-Time Analytics Systems

by Mark Needham
Released October 2023
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098138790

Book description

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly.

In the first part of this book, authors Mark Needham and Dunith Dhanushka from StarTree provide an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service.

With this book, you will:

  • Learn common architectures for real-time analytics
  • Discover how event processing differs from real-time analytics
  • Ingest event data from Apache Kafka into Apache Pinot
  • Combine event streams with static data using Kafka Streams
  • Write real-time queries against event data stored in Apache Pinot
  • Build a real-time dashboard, fraud detection pipeline, order tracking app, and anomaly detection system
  • Learn how organizations like Uber, Stripe, and Just Eat use real-time analytics

Table of contents

  1. 1. Introduction to Real-Time Analytics
    1. What is an event stream?
    2. Making sense of streaming data
    3. What is Real-Time Analytics?
    4. Benefits of Real-Time Analytics
    5. Real-Time Analytics Use Cases
    6. Building Real-Time Analytics Applications
    7. Summary
  2. 2. The Real-Time Analytics Ecosystem
    1. Defining the Real-time Analytics Ecosystem
    2. The classic streaming stack
      1. Complex Event Processing
      2. The Big Data Era
    3. The Modern streaming stack
      1. Event Producers
      2. Streaming data platform
      3. Stream processing layer
      4. Serving layer
      5. Front End
    4. Summary
  3. 3. Introducing All About That Dough: Real-Time Analytics on Pizza
    1. Existing Architecture
    2. Setup
      1. MySQL
      2. Apache Kafka
      3. Zookeeper
      4. Orders Service
      5. Spinning up the components
    3. Inspecting the data
    4. Applications of Real-Time Analytics
    5. Summary
  4. 4. Querying Kafka with Kafka Streams
    1. What is Kafka Streams?
    2. What is Quarkus?
    3. Quarkus Application
      1. Installing Quarkus CLI
      2. Create Quarkus Application
      3. Topology
      4. Querying the key value store
      5. HTTP endpoint
    4. Running the application
    5. Querying the HTTP endpoint
    6. Limitations of Kafka Streams
    7. Summary
  5. 5. The Serving Layer: Apache Pinot
    1. Why can’t we use another stream processor?
    2. What can’t we use a data warehouse?
    3. What is Apache Pinot?
      1. Pinot’s components
    4. How does Pinot model and store data?
      1. Schema
      2. Table
    5. Setup
    6. Data Ingestion
    7. Pinot Data Explorer
    8. Indexes
    9. Updating the web app
    10. Summary
  6. 6. Building a Real-Time Analytics Dashboard
    1. Dashboard Architecture
    2. What is Streamlit?
    3. Setup
    4. Building the dashboard
    5. Summary
  7. 7. Real-Time Analytics in the Real World
    1. Content Recommendation (Professional Social Network)
    2. Operational Analytics (Streaming Service)
    3. Real-Time Ad Analytics (Online Marketplace)
    4. User Facing Analytics (Collaboration Platform)
    5. Conclusion
  8. About the Author

