book

Kafka Connect

by Mickael Maison, Kate Stanley

September 2023

Beginner to intermediate

400 pages

10h 33m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This BookKafka VersionsNavigating This BookConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgements
I. Introduction to Kafka Connect
1. Meet Kafka Connect
Kafka Connect FeaturesPluggable ArchitectureScalability and ReliabilityDeclarative Pipeline DefinitionPart of Apache KafkaUse CasesCapturing Database ChangesMirroring Kafka ClustersBuilding Data LakesAggregating LogsModernizing Legacy SystemsAlternatives to Kafka ConnectSummary
2. Apache Kafka Basics
A Distributed Event Streaming PlatformOpen SourceDistributedEvent StreamingPlatformKafka ConceptsPublish-SubscribeBrokers and RecordsTopics and PartitionsReplicationRetention and CompactionKRaft and ZooKeeperInteracting with KafkaProducersConsumersKafka StreamsGetting Started with KafkaStarting KafkaSending and Receiving RecordsRunning a Kafka Streams ApplicationSummary
II. Developing Data Pipelines with Kafka Connect
3. Components in a Kafka Connect Data Pipeline
Kafka Connect RuntimeRunning Kafka ConnectKafka Connect REST APIInstalling Plug-InsDeployment ModesSource and Sink ConnectorsConnectors and TasksConfiguring ConnectorsRunning ConnectorsConvertersData Format and SchemasConfiguring ConvertersUsing ConvertersTransformations and PredicatesTransformation Use CasesPredicatesConfiguring Transformations and PredicatesUsing Transformations and PredicatesSummary
4. Designing Effective Data Pipelines
Choosing a ConnectorPipeline DirectionLicensing and SupportConnector FeaturesDefining Data ModelsData TransformationMapping Data Between SystemsFormatting DataData FormatsSchemasExploring Kafka Connect InternalsInternal TopicsGroup MembershipRebalance ProtocolsHandling Failures in Kafka ConnectWorker FailureConnector/Task FailureKafka/External Systems FailureDead Letter QueuesUnderstanding Processing SemanticsSink ConnectorsSource ConnectorsSummary
5. Connectors in Action
Confluent S3 Sink ConnectorConfiguring the ConnectorExactly-Once SemanticsRunning the ConnectorConfluent JDBC Source ConnectorConfiguring the ConnectorRunning the ConnectorDebezium MySQL Source ConnectorConfiguring the ConnectorEvent FormatsRunning the ConnectorSummary
6. Mirroring Clusters with MirrorMaker
Introduction to MirroringExploring Mirroring Use CasesMirroring in PracticeIntroduction to MirrorMakerCommon ConceptsDeployment ModesMirrorMaker ConnectorsMirrorSourceConnectorMirrorCheckpointConnectorMirrorHeartbeatConnectorRunning MirrorMakerDisaster Recovery ExampleGeo-Replication ExampleSummary

III. Running Kafka Connect in Production
7. Deploying and Operating Kafka Connect Clusters
Preparing the Kafka Connect EnvironmentBuilding a Kafka Connect EnvironmentInstalling Plug-InsNetworking and PermissionsWorker Plug-InsConfiguration ProvidersREST ExtensionsConnector Client Configuration Override PoliciesSizing and Planning CapacityUnderstanding Kafka Connect Resource UtilizationHow Many Workers and Tasks?Operating Kafka Connect ClustersAdding WorkersRemoving WorkersUpgrading and Applying Maintenance to WorkersRestarting Failed Tasks and ConnectorsResetting Offsets of ConnectorsAdministering Kafka Connect Using the REST APICreating and Deleting a ConnectorConnector and Task ConfigurationControlling the Lifecycle of ConnectorsListing Connector OffsetsDebugging IssuesSummary
8. Configuring Kafka Connect
Configuring the RuntimeConfigurations for ProductionFine-Tuning ConfigurationsConfiguring ConnectorsTopic ConfigurationsClient OverridesConfigurations for Exactly-OnceConfigurations for Error HandlingConfiguring Kafka Connect Clusters for SecuritySecuring the Connection to KafkaConfiguring PermissionsSecuring the REST APISummary
9. Monitoring Kafka Connect
Monitoring LogsLogging ConfigurationUnderstanding Startup LogsAnalyzing LogsMonitoring MetricsMetrics ReportersAnalyzing MetricsExploring MetricsKey MetricsKafka Connect Runtime MetricsOther System MetricsSummary
10. Administering Kafka Connect on Kubernetes
Introduction to KubernetesVirtualization TechnologiesKubernetes FundamentalsRunning Kafka Connect on KubernetesContainer ImageDeploying WorkersNetworking and MonitoringConfigurationUsing a Kubernetes Operator to Deploy Kafka ConnectIntroduction to Kubernetes OperatorsKubernetes Operators for Kafka ConnectStrimziGetting a Kubernetes EnvironmentStarting the OperatorKafka Connect CRDsDeploying a Kafka Connect Cluster and ConnectorsMirrorMaker CRDSummary
IV. Building Custom Connectors and Plug-Ins
11. Building Source and Sink Connectors
Common Concepts and APIsBuilding a Custom ConnectorThe Connector APIConfigurationsThe Task APIKafka Connect RecordsThe ConnectorContext APIImplementing Source ConnectorsThe SourceTask APISource RecordsThe SourceConnectorContext and SourceTaskContext APIsExactly-Once SupportImplementing Sink ConnectorsThe SinkTask APISink RecordsThe SinkConnectorContext and SinkTaskContext APIsSummary
12. Extending Kafka Connect with Connector and Worker Plug-Ins
Implementing Connector Plug-InsThe Transformation APIThe Predicate APIThe Converter and HeaderConverter APIsImplementing Worker Plug-InsThe ConfigProvider APIThe ConnectorClientConfigOverridePolicy APIThe ConnectRestExtension APIsSummary
Index
About the Authors

Content preview from Kafka Connect

Foreword

Consensus protocols, stream processing, distributed systems—in the midst of all the exciting ideas in the streaming world, it can be easy to overlook the role of the humble connector. But connectors solve the most fundamental problem in the streaming world—in a world of data at rest, how do you access streams at all? How do you plug your data-streaming platform into the rest of the business?

Kafka Connect’s aim is to make that easier. Before the Kafka Connect framework existed, we saw many people build integrations with Apache Kafka and repeat the same mistakes. Reading data from one system and writing it to another seems simple enough, but the process can have a lot of hidden complexity. What happens if a machine fails? What happens when requests time out? How do you scale up your integration? Each unique Kafka integration had to solve these problems from scratch. Kafka Connect was designed to separate out the logic of reading and writing to a particular system from a general framework for building, operating, and scaling these integrations.

Kafka Connect is different from other integration or connector layers in a lot of important ways:

It’s designed for streaming first.
It works with Kafka’s semantics to enable exactly-once from systems that will allow it, and the strongest semantics possible for systems that don’t.
It lets you not just capture bytes, but also propagate some of the semantic structure of data.
It solves a lot of the complex problems in partitioning, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098126520Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Kafka Connect

by Mickael Maison, Kate Stanley

Foreword

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.