Skip to Content
Building Real-Time Analytics Systems
book

Building Real-Time Analytics Systems

by Mark Needham
September 2023
Beginner to intermediate
220 pages
4h 36m
English
O'Reilly Media, Inc.
Book available
Content preview from Building Real-Time Analytics Systems

Chapter 7. Product Changes Captured with Change Data Capture

The operations team at AATD is now able to get a solid overview of the number of orders and the revenue that the business is making. What’s missing is that they don’t know what’s happening at the product level. Complaints from other parts of the business indicate that some products are seeing surges in orders while there’s too much stock for other items.

The data about individual products is currently stored in the MySQL database, but we need to get it out of there and into our real-time analytics architecture. In this chapter, we’ll learn how to do this using a technique called change data capture (CDC).

Capturing Changes from Operational Databases

Businesses often record their transactions in operational, or OLTP, databases. Businesses often want to analyze their operational data, but how should they go about doing that?

Traditionally, ETL pipelines have been used to move data from operational databases to analytical databases like data warehouses. Those pipelines were executed periodically, extracting data from source databases in large batches. After that, the data was transformed before loading it into the analytics database.

The problem with this classic approach was the significant latency between data collection and decision making. For example, a typical batch pipeline would take minutes, hours, or days to generate insights from operational data.

What if there was a mechanism to capture changes made to source ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Building Real-Time Analytics Applications

Building Real-Time Analytics Applications

Darin Briskman
Advanced Analytics with PySpark

Advanced Analytics with PySpark

Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills
Analytics Engineering with SQL and dbt

Analytics Engineering with SQL and dbt

Rui Pedro Machado, Helder Russa

Publisher Resources

ISBN: 9781098138783Errata PageSupplemental Content