book

Kafka in Action

Name: Kafka in Action
ISBN: 9781617295232

by Dylan Scott, Viktor Gamov, Dave Klein

February 2022

Beginner

272 pages

7h 48m

English

Manning Publications

Read now

Unlock full access

Kafka in Action
Copyright
Dedication
Brief contents
contents
Front matter
forewordprefaceacknowledgmentsabout this bookWho should read this book?How this book is organized: A roadmapAbout the codeliveBook discussion forumOther online resourcesabout the authorsabout the cover illustration
Part 1. Getting started
1 Introduction to Kafka
1.1 What is Kafka?1.2 Kafka usage1.2.1 Kafka for the developer1.2.2 Explaining Kafka to your manager1.3 Kafka myths1.3.1 Kafka only works with Hadoop®1.3.2 Kafka is the same as other message brokers1.4 Kafka in the real world1.4.1 Early examples1.4.2 Later examples1.4.3 When Kafka might not be the right fit1.5 Online resources to get startedSummaryReferences
2 Getting to know Kafka
2.1 Producing and consuming a message2.2 What are brokers?2.3 Tour of Kafka2.3.1 Producers and consumers2.3.2 Topics overview2.3.3 ZooKeeper usage2.3.4 Kafka’s high-level architecture2.3.5 The commit log2.4 Various source code packages and what they do2.4.1 Kafka Streams2.4.2 Kafka Connect2.4.3 AdminClient package2.4.4 ksqlDB2.5 Confluent clients2.6 Stream processing and terminology2.6.1 Stream processing2.6.2 What exactly-once meansSummaryReferences
Part 2. Applying Kafka

3 Designing a Kafka project
3.1 Designing a Kafka project3.1.1 Taking over an existing data architecture3.1.2 A first change3.1.3 Built-in features3.1.4 Data for our invoices3.2 Sensor event design3.2.1 Existing issues3.2.2 Why Kafka is the right fit3.2.3 Thought starters on our design3.2.4 User data requirements3.2.5 High-level plan for applying our questions3.2.6 Reviewing our blueprint3.3 Format of your data3.3.1 Plan for data3.3.2 Dependency setupSummaryReferences
4 Producers: Sourcing data
4.1 An example4.1.1 Producer notes4.2 Producer options4.2.1 Configuring the broker list4.2.2 How to go fast (or go safer)4.2.3 Timestamps4.3 Generating code for our requirements4.3.1 Client and broker versionsSummaryReferences
5 Consumers: Unlocking data
5.1 An example5.1.1 Consumer options5.1.2 Understanding our coordinates5.2 How consumers interact5.3 Tracking5.3.1 Group coordinator5.3.2 Partition assignment strategy5.4 Marking our place5.5 Reading from a compacted topic5.6 Retrieving code for our factory requirements5.6.1 Reading options5.6.2 RequirementsSummaryReferences
6 Brokers
6.1 Introducing the broker6.2 Role of ZooKeeper6.3 Options at the broker level6.3.1 Kafka’s other logs: Application logs6.3.2 Server log6.3.3 Managing state6.4 Partition replica leaders and their role6.4.1 Losing data6.5 Peeking into Kafka6.5.1 Cluster maintenance6.5.2 Adding a broker6.5.3 Upgrading your cluster6.5.4 Upgrading your clients6.5.5 Backups6.6 A note on stateful systems6.7 ExerciseSummaryReferences
7 Topics and partitions
7.1 Topics7.1.1 Topic-creation options7.1.2 Replication factors7.2 Partitions7.2.1 Partition location7.2.2 Viewing our logs7.3 Testing with EmbeddedKafkaCluster7.3.1 Using Kafka Testcontainers7.4 Topic compactionSummaryReferences
8 Kafka storage
8.1 How long to store data8.2 Data movement8.2.1 Keeping the original event8.2.2 Moving away from a batch mindset8.3 Tools8.3.1 Apache Flume8.3.2 Red Hat® Debezium™8.3.3 Secor8.3.4 Example use case for data storage8.4 Bringing data back into Kafka8.4.1 Tiered storage8.5 Architectures with Kafka8.5.1 Lambda architecture8.5.2 Kappa architecture8.6 Multiple cluster setups8.6.1 Scaling by adding clusters8.7 Cloud- and container-based storage options8.7.1 Kubernetes clustersSummaryReferences
9 Management: Tools and logging
9.1 Administration clients9.1.1 Administration in code with AdminClient9.1.2 kcat9.1.3 Confluent REST Proxy API9.2 Running Kafka as a systemd service9.3 Logging9.3.1 Kafka application logs9.3.2 ZooKeeper logs9.4 Firewalls9.4.1 Advertised listeners9.5 Metrics9.5.1 JMX console9.6 Tracing option9.6.1 Producer logic9.6.2 Consumer logic9.6.3 Overriding clients9.7 General monitoring toolsSummaryReferences
Part 3. Going further
10 Protecting Kafka
10.1 Security basics10.1.1 Encryption with SSL10.1.2 SSL between brokers and clients10.1.3 SSL between brokers10.2 Kerberos and the Simple Authentication and Security Layer (SASL)10.3 Authorization in Kafka10.3.1 Access control lists (ACLs)10.3.2 Role-based access control (RBAC)10.4 ZooKeeper10.4.1 Kerberos setup10.5 Quotas10.5.1 Network bandwidth quota10.5.2 Request rate quotas10.6 Data at rest10.6.1 Managed optionsSummaryReferences
11 Schema registry
11.1 A proposed Kafka maturity model11.1.1 Level 011.1.2 Level 111.1.3 Level 211.1.4 Level 311.2 The Schema Registry11.2.1 Installing the Confluent Schema Registry11.2.2 Registry configuration11.3 Schema features11.3.1 REST API11.3.2 Client library11.4 Compatibility rules11.4.1 Validating schema modifications11.5 Alternative to a schema registrySummaryReferences
12 Stream processing with Kafka Streams and ksqlDB
12.1 Kafka Streams12.1.1 KStreams API DSL12.1.2 KTable API12.1.3 GlobalKTable API12.1.4 Processor API12.1.5 Kafka Streams setup12.2 ksqlDB: An event-streaming database12.2.1 Queries12.2.2 Local development12.2.3 ksqlDB architecture12.3 Going further12.3.1 Kafka Improvement Proposals (KIPs)12.3.2 Kafka projects you can explore12.3.3 Community Slack channelSummaryReferences
Appendix A. Installation
A.1 Operating system (OS) requirementsA.2 Kafka versionsA.3 Installing Kafka on your local machineA.3.1 Prerequisite: JavaA.3.2 Prerequisite: ZooKeeperA.3.3 Prerequisite: Kafka downloadA.3.4 Starting a ZooKeeper serverA.3.5 Creating and configuring a cluster by handA.4 Confluent PlatformA.4.1 Confluent command line interface (CLI)A.4.2 DockerA.5 How to work with the book examplesA.5.1 Building from the command lineA.6 TroubleshootingReferences
Appendix B. Client example
B.1 Python Kafka clientsB.1.1 Installing PythonB.1.2 Python producer exampleB.1.3 Python consumerB.2 Client testingB.2.1 Unit testing in JavaB.2.2 Kafka TestcontainersReferences
index

Overview

Master the wicked-fast Apache Kafka streaming platform through hands-on examples and real-world projects.

In Kafka in Action you will learn:

Understanding Apache Kafka concepts
Setting up and executing basic ETL tasks using Kafka Connect
Using Kafka as part of a large data project team
Performing administrative tasks
Producing and consuming event streams
Working with Kafka from Java applications
Implementing Kafka as a message queue

Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics.

About the Technology
Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications.

About the Book
Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team.

What's Inside

Kafka as an event streaming platform
Kafka producers and consumers from Java applications
Kafka as part of a large data project

About the Reader
For intermediate Java developers or data engineers. No prior knowledge of Kafka required.

About the Authors
Dylan Scott is a software developer in the insurance industry. Viktor Gamov is a Kafka-focused developer advocate. At Confluent, Dave Klein helps developers, teams, and enterprises harness the power of event streaming with Apache Kafka.

Quotes
The authors have had many years of real-world experience using Kafka, and this book’s on-the-ground feel really sets it apart.
- From the foreword by Jun Rao, Confluent Cofounder

A surprisingly accessible introduction to a very complex technology. Developers will want to keep a copy close by.
- Conor Redmond, InComm Payments

A comprehensive and practical guide to Kafka and the ecosystem.
- Sumant Tambe, Linkedin

It quickly gave me insight into how Kafka works, and how to design and protect distributed message applications.
- Gregor Rayman, Cloudfarms

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781617295232Publisher Support Publisher Website

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills