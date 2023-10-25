Kafka Connect

Kafka Connect

by Mickael Maison, Kate Stanley
Released October 2023
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098126537

Book description

Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time.

With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline.

  • Learn Connect's capabilities, main concepts, and terminology
  • Design data and event streaming pipelines that use Connect
  • Configure and operate Connect environments at scale
  • Deploy secured and highly available Connect clusters
  • Build sink and source connectors and single message transforms and converters

Table of contents

  1. 1. Introducing Kafka Connect
    1. Introducing Kafka Connect
      1. Pluggable Architecture
      2. Scalability and Reliability
      3. Kafka Connect’s REST Management API
      4. Part of Apache Kafka
    2. Use Cases
      1. Capturing Databases Changes
      2. Mirroring Kafka Clusters
      3. Building Data Lakes
      4. Aggregating Logs
      5. Modernizing legacy systems
    3. Alternatives to Kafka Connect
    4. Summary
  2. 2. Apache Kafka Basics
    1. A Distributed Event Streaming Platform
      1. Open Source
      2. Distributed
      3. Event Streaming
      4. Platform
    2. Use Cases
      1. Log/Metrics Aggregation
      2. Stream Processing
      3. Messaging
    3. How Kafka Works
      1. Brokers and Records
      2. Topics and Partitions
      3. Replication
      4. Retention and Compaction
      5. Kafka Clients
      6. Producers
      7. Consumers
      8. Streams
    4. Getting Started with Kafka
      1. Starting Kafka
    5. Summary
  3. 3. Components in a Connect Data Pipeline
    1. Kafka Connect Runtime
      1. Binaries and Scripts
      2. Running Kafka Connect in Distributed Mode
      3. Plugins
      4. Kafka Connect REST API
    2. Source and Sink Connectors
      1. How Do Connectors Work?
      2. Finding Connectors for Your Use Case
      3. How Do You Run Connectors?
    3. Converters
      1. Why Is Data Format Important?
      2. Converters and Schemas
      3. Configuring Connect with Converters
      4. Example
    4. Transformations
      1. What Can Transformations Do?
      2. Configuring Transformations
      3. Enabling Transformations in Your Pipeline
    5. Summary
  4. 4. Building Effective Data Pipelines
    1. Choosing a Connector
      1. Flow Direction
      2. Licensing and Support
      3. Connector Features
    2. Defining Data Models
      1. Data Transformation
      2. Mapping Data Between Systems
    3. Formatting Data
      1. Data Format
      2. Schemas
      3. Schema Registry
    4. Exploring Connect Internals
      1. Internal Topics
      2. Group Membership
      3. Rebalancing Protocols
    5. Handling Failures in Connect
      1. Worker Failure
      2. Connector/Task Failure
      3. Kafka/External Systems Failure
      4. Dead Letter Queues
    6. Understanding Delivery Semantics
      1. Sink Connectors
      2. Source Connectors
    7. Summary
  5. 5. Connectors in Action
    1. Confluent S3 Sink Connector
      1. Configuring the Connector
      2. Exactly Once Semantics
      3. Running the connector
    2. Confluent JDBC Source Connector
      1. Configuring the Connector
      2. Running the connector
    3. Debezium MySQL Source Connector
      1. Configuring the Connector
      2. Event Formats
      3. Running the Connector
    4. Summary
  6. 6. Mirroring clusters with MirrorMaker
    1. Introduction to Mirroring
      1. Exploring Mirroring Use Cases
      2. Mirroring in Practice
    2. Introduction to MirrorMaker
      1. Common Concepts
      2. Deployment Modes
    3. MirrorMaker Connectors
      1. MirrorSourceConnector
      2. MirrorCheckpointConnector
      3. MirrorHeartbeatConnector
    4. Running MirrorMaker
      1. Disaster Recovery Example
      2. Geo Replication Example
    5. Summary
  7. 7. Deploying and Operating Kafka Connect Clusters
    1. Preparing the Kafka Connect environment
      1. Building a Connect environment
      2. Installing plugins
      3. Networking and permissions
    2. Worker Plugins
      1. Configuration Providers
      2. REST Extensions
    3. Sizing and Planning capacity
      1. Understanding Connect resources utilization
      2. How many workers and tasks?
      3. Single Cluster vs Separate Clusters
    4. Operating Connect clusters
      1. Adding Workers
      2. Removing Workers
      3. Upgrading and Applying Maintenance to Workers
      4. Restarting failed tasks and connectors
      5. Resetting offsets of Connectors
    5. Administering Connect using the REST API
      1. Creating and deleting a connector
      2. Connector and task configuration
      3. Controlling the lifecycle of connectors
      4. Debugging issues
    6. Summary
  8. 8. Configuring Kafka Connect
    1. Configuring the Runtime
      1. Configurations for Production
      2. Fine Tuning Configurations
    2. Configuring Connectors
      1. Topic Configurations
      2. Client Overrides
      3. Configurations for Exactly Once
      4. Configurations for Error Handling
    3. Configuring Connect Clusters for Security
      1. Securing the Connection to Kafka
      2. Configuring Permissions
      3. Securing the REST API
    4. Summary
  9. 9. Monitoring Kafka Connect
    1. Monitoring Logs
      1. Logging Configuration
      2. Understanding Startup Logs
      3. Analyzing Logs
    2. Monitoring Metrics
      1. Metrics Reporters
      2. Analyzing Metrics
      3. Exploring Metrics
    3. Key Metrics
      1. Connect Runtime Metrics
      2. Other System Metrics
    4. Summary
  10. 10. Administering Connect on Kubernetes
    1. Introduction to Kubernetes
      1. Kubernetes Fundamentals
    2. Running Connect on Kubernetes
      1. Container Image
      2. Deploying Workers
      3. Networking and Monitoring
      4. Configuration
    3. Using a Kubernetes Operator to deploy Connect
      1. Introduction to Kubernetes Operators
      2. Kubernetes Operators for Connect
    4. Strimzi
      1. Kubernetes Environment
      2. Starting the Operator
      3. Connect CRDs
      4. Deploying a Connect Cluster and Connectors
      5. MirrorMaker CRD
    5. Summary
  11. 11. Building Source and Sink Connectors
    1. Common Concepts and APIs
      1. Building a Custom Connector
      2. The Connector API
      3. Configurations
      4. The Task API
      5. Connect Records
      6. The ConnectorContext API
    2. Implementing Source Connectors
      1. The Source Task API
      2. Source Records
      3. The SourceConnectorContext and SourceTaskContext APIs
      4. Exactly Once Support
    3. Implementing Sink Connectors
      1. The Sink Task API
      2. Sink Records
      3. The SinkConnectorContext and SinkTaskContext APIs
    4. Summary
  12. 12. Extending Connect with Connector and Worker Plugins
    1. Implementing Connector Plugins
      1. The Transformation API
      2. The Predicate API
      3. The Converter and HeaderConverter APIs
    2. Implementing Worker Plugins
      1. The ConfigProvider API
      2. The ConnectorClientConfigOverridePolicy API
      3. The ConnectRestExtension APIs
    3. Summary
