Kafka Connect

Book description

Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time.

With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline.

  • Learn Kafka Connect's capabilities, main concepts, and terminology
  • Design data and event streaming pipelines that use Kafka Connect
  • Configure and operate Kafka Connect environments at scale
  • Deploy secured and highly available Kafka Connect clusters
  • Build sink and source connectors and single message transforms and converters

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book
    2. Kafka Versions
    3. Navigating This Book
    4. Conventions Used in This Book
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgements
  3. I. Introduction to Kafka Connect
  4. 1. Meet Kafka Connect
    1. Kafka Connect Features
      1. Pluggable Architecture
      2. Scalability and Reliability
      3. Declarative Pipeline Definition
      4. Part of Apache Kafka
    2. Use Cases
      1. Capturing Database Changes
      2. Mirroring Kafka Clusters
      3. Building Data Lakes
      4. Aggregating Logs
      5. Modernizing Legacy Systems
    3. Alternatives to Kafka Connect
    4. Summary
  5. 2. Apache Kafka Basics
    1. A Distributed Event Streaming Platform
      1. Open Source
      2. Distributed
      3. Event Streaming
      4. Platform
    2. Kafka Concepts
      1. Publish-Subscribe
      2. Brokers and Records
      3. Topics and Partitions
      4. Replication
      5. Retention and Compaction
      6. KRaft and ZooKeeper
    3. Interacting with Kafka
      1. Producers
      2. Consumers
      3. Kafka Streams
    4. Getting Started with Kafka
      1. Starting Kafka
      2. Sending and Receiving Records
      3. Running a Kafka Streams Application
    5. Summary
  6. II. Developing Data Pipelines with Kafka Connect
  7. 3. Components in a Kafka Connect Data Pipeline
    1. Kafka Connect Runtime
      1. Running Kafka Connect
      2. Kafka Connect REST API
      3. Installing Plug-Ins
      4. Deployment Modes
    2. Source and Sink Connectors
      1. Connectors and Tasks
      2. Configuring Connectors
      3. Running Connectors
    3. Converters
      1. Data Format and Schemas
      2. Configuring Converters
      3. Using Converters
    4. Transformations and Predicates
      1. Transformation Use Cases
      2. Predicates
      3. Configuring Transformations and Predicates
      4. Using Transformations and Predicates
    5. Summary
  8. 4. Designing Effective Data Pipelines
    1. Choosing a Connector
      1. Pipeline Direction
      2. Licensing and Support
      3. Connector Features
    2. Defining Data Models
      1. Data Transformation
      2. Mapping Data Between Systems
    3. Formatting Data
      1. Data Formats
      2. Schemas
    4. Exploring Kafka Connect Internals
      1. Internal Topics
      2. Group Membership
      3. Rebalance Protocols
    5. Handling Failures in Kafka Connect
      1. Worker Failure
      2. Connector/Task Failure
      3. Kafka/External Systems Failure
      4. Dead Letter Queues
    6. Understanding Processing Semantics
      1. Sink Connectors
      2. Source Connectors
    7. Summary
  9. 5. Connectors in Action
    1. Confluent S3 Sink Connector
      1. Configuring the Connector
      2. Exactly-Once Semantics
      3. Running the Connector
    2. Confluent JDBC Source Connector
      1. Configuring the Connector
      2. Running the Connector
    3. Debezium MySQL Source Connector
      1. Configuring the Connector
      2. Event Formats
      3. Running the Connector
    4. Summary
  10. 6. Mirroring Clusters with MirrorMaker
    1. Introduction to Mirroring
      1. Exploring Mirroring Use Cases
      2. Mirroring in Practice
    2. Introduction to MirrorMaker
      1. Common Concepts
      2. Deployment Modes
    3. MirrorMaker Connectors
      1. MirrorSourceConnector
      2. MirrorCheckpointConnector
      3. MirrorHeartbeatConnector
    4. Running MirrorMaker
      1. Disaster Recovery Example
      2. Geo-Replication Example
    5. Summary
  11. III. Running Kafka Connect in Production
  12. 7. Deploying and Operating Kafka Connect Clusters
    1. Preparing the Kafka Connect Environment
      1. Building a Kafka Connect Environment
      2. Installing Plug-Ins
      3. Networking and Permissions
    2. Worker Plug-Ins
      1. Configuration Providers
      2. REST Extensions
      3. Connector Client Configuration Override Policies
    3. Sizing and Planning Capacity
      1. Understanding Kafka Connect Resource Utilization
      2. How Many Workers and Tasks?
    4. Operating Kafka Connect Clusters
      1. Adding Workers
      2. Removing Workers
      3. Upgrading and Applying Maintenance to Workers
      4. Restarting Failed Tasks and Connectors
      5. Resetting Offsets of Connectors
    5. Administering Kafka Connect Using the REST API
      1. Creating and Deleting a Connector
      2. Connector and Task Configuration
      3. Controlling the Lifecycle of Connectors
      4. Listing Connector Offsets
      5. Debugging Issues
    6. Summary
  13. 8. Configuring Kafka Connect
    1. Configuring the Runtime
      1. Configurations for Production
      2. Fine-Tuning Configurations
    2. Configuring Connectors
      1. Topic Configurations
      2. Client Overrides
      3. Configurations for Exactly-Once
      4. Configurations for Error Handling
    3. Configuring Kafka Connect Clusters for Security
      1. Securing the Connection to Kafka
      2. Configuring Permissions
      3. Securing the REST API
    4. Summary
  14. 9. Monitoring Kafka Connect
    1. Monitoring Logs
      1. Logging Configuration
      2. Understanding Startup Logs
      3. Analyzing Logs
    2. Monitoring Metrics
      1. Metrics Reporters
      2. Analyzing Metrics
      3. Exploring Metrics
    3. Key Metrics
      1. Kafka Connect Runtime Metrics
      2. Other System Metrics
    4. Summary
  15. 10. Administering Kafka Connect on Kubernetes
    1. Introduction to Kubernetes
      1. Virtualization Technologies
      2. Kubernetes Fundamentals
    2. Running Kafka Connect on Kubernetes
      1. Container Image
      2. Deploying Workers
      3. Networking and Monitoring
      4. Configuration
    3. Using a Kubernetes Operator to Deploy Kafka Connect
      1. Introduction to Kubernetes Operators
      2. Kubernetes Operators for Kafka Connect
    4. Strimzi
      1. Getting a Kubernetes Environment
      2. Starting the Operator
      3. Kafka Connect CRDs
      4. Deploying a Kafka Connect Cluster and Connectors
      5. MirrorMaker CRD
    5. Summary
  16. IV. Building Custom Connectors and Plug-Ins
  17. 11. Building Source and Sink Connectors
    1. Common Concepts and APIs
      1. Building a Custom Connector
      2. The Connector API
      3. Configurations
      4. The Task API
      5. Kafka Connect Records
      6. The ConnectorContext API
    2. Implementing Source Connectors
      1. The SourceTask API
      2. Source Records
      3. The SourceConnectorContext and SourceTaskContext APIs
      4. Exactly-Once Support
    3. Implementing Sink Connectors
      1. The SinkTask API
      2. Sink Records
      3. The SinkConnectorContext and SinkTaskContext APIs
    4. Summary
  18. 12. Extending Kafka Connect with Connector and Worker Plug-Ins
    1. Implementing Connector Plug-Ins
      1. The Transformation API
      2. The Predicate API
      3. The Converter and HeaderConverter APIs
    2. Implementing Worker Plug-Ins
      1. The ConfigProvider API
      2. The ConnectorClientConfigOverridePolicy API
      3. The ConnectRestExtension APIs
    3. Summary
  19. Index
  20. About the Authors

Product information

  • Title: Kafka Connect
  • Author(s): Mickael Maison, Kate Stanley
  • Release date: September 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098126537