Chapter 2. How Change Data Capture Works

Change data capture (CDC) identifies and captures just the most recent production data and metadata changes that the source has registered during a given time period, typically measured in seconds or minutes, and then enables replication software to copy those changes to a separate data repository. A variety of technical mechanisms enable CDC to minimize time and overhead in the manner most suited to the type of analytics or application it supports. CDC can accompany batch load replication to ensure that the target is and remains synchronized with the source upon load completion. Like batch loads, CDC helps replication software copy data from one source to one target, or one source to multiple targets. CDC also identifies and replicates changes to source schema (that is, data definition language [DDL]) changes, enabling targets to dynamically adapt to structural updates. This eliminates the risk that other data management and analytics processes become brittle and require time-consuming manual updates.

Source, Target, and Data Types

Traditional CDC sources include operational databases, applications, and mainframe systems, most of which maintain transaction logs that are easily accessed by CDC. More recently, these traditional repositories serve as landing zones for new types of data created by Internet of Things (IoT) sensors, social media message streams, and other data-emitting technologies.

Targets, meanwhile, commonly include not ...

Get Streaming Change Data Capture now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.