Kafka in Action

Book description

Master the wicked-fast Apache Kafka streaming platform through hands-on examples and real-world projects.

In Kafka in Action you will learn:

  • Understanding Apache Kafka concepts
  • Setting up and executing basic ETL tasks using Kafka Connect
  • Using Kafka as part of a large data project team
  • Performing administrative tasks
  • Producing and consuming event streams
  • Working with Kafka from Java applications
  • Implementing Kafka as a message queue

Kafka in Action is a fast-paced introduction to every aspect of working with Apache Kafka. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to produce and consume streams of events. Advancing quickly, you’ll soon be ready to use Kafka in your day-to-day workflow, and start digging into even more advanced Kafka topics.

About the Technology
Think of Apache Kafka as a high performance software bus that facilitates event streaming, logging, analytics, and other data pipeline tasks. With Kafka, you can easily build features like operational data monitoring and large-scale event processing into both large and small-scale applications.

About the Book
Kafka in Action introduces the core features of Kafka, along with relevant examples of how to use it in real applications. In it, you’ll explore the most common use cases such as logging and managing streaming data. When you’re done, you’ll be ready to handle both basic developer- and admin-based tasks in a Kafka-focused team.

What's Inside
  • Kafka as an event streaming platform
  • Kafka producers and consumers from Java applications
  • Kafka as part of a large data project


About the Reader
For intermediate Java developers or data engineers. No prior knowledge of Kafka required.

About the Authors
Dylan Scott is a software developer in the insurance industry. Viktor Gamov is a Kafka-focused developer advocate. At Confluent, Dave Klein helps developers, teams, and enterprises harness the power of event streaming with Apache Kafka.

Quotes
The authors have had many years of real-world experience using Kafka, and this book’s on-the-ground feel really sets it apart.
- From the foreword by Jun Rao, Confluent Cofounder

A surprisingly accessible introduction to a very complex technology. Developers will want to keep a copy close by.
- Conor Redmond, InComm Payments

A comprehensive and practical guide to Kafka and the ecosystem.
- Sumant Tambe, Linkedin

It quickly gave me insight into how Kafka works, and how to design and protect distributed message applications.
- Gregor Rayman, Cloudfarms

Table of contents

  1. Kafka in Action
  2. Copyright
  3. Dedication
  4. Brief contents
  5. contents
  6. Front matter
    1. foreword
    2. preface
    3. acknowledgments
    4. about this book
      1. Who should read this book?
      2. How this book is organized: A roadmap
      3. About the code
      4. liveBook discussion forum
      5. Other online resources
    5. about the authors
    6. about the cover illustration
  7. Part 1. Getting started
  8. 1 Introduction to Kafka
    1. 1.1 What is Kafka?
    2. 1.2 Kafka usage
      1. 1.2.1 Kafka for the developer
      2. 1.2.2 Explaining Kafka to your manager
    3. 1.3 Kafka myths
      1. 1.3.1 Kafka only works with Hadoop®
      2. 1.3.2 Kafka is the same as other message brokers
    4. 1.4 Kafka in the real world
      1. 1.4.1 Early examples
      2. 1.4.2 Later examples
      3. 1.4.3 When Kafka might not be the right fit
    5. 1.5 Online resources to get started
    6. Summary
    7. References
  9. 2 Getting to know Kafka
    1. 2.1 Producing and consuming a message
    2. 2.2 What are brokers?
    3. 2.3 Tour of Kafka
      1. 2.3.1 Producers and consumers
      2. 2.3.2 Topics overview
      3. 2.3.3 ZooKeeper usage
      4. 2.3.4 Kafka’s high-level architecture
      5. 2.3.5 The commit log
    4. 2.4 Various source code packages and what they do
      1. 2.4.1 Kafka Streams
      2. 2.4.2 Kafka Connect
      3. 2.4.3 AdminClient package
      4. 2.4.4 ksqlDB
    5. 2.5 Confluent clients
    6. 2.6 Stream processing and terminology
      1. 2.6.1 Stream processing
      2. 2.6.2 What exactly-once means
    7. Summary
    8. References
  10. Part 2. Applying Kafka
  11. 3 Designing a Kafka project
    1. 3.1 Designing a Kafka project
      1. 3.1.1 Taking over an existing data architecture
      2. 3.1.2 A first change
      3. 3.1.3 Built-in features
      4. 3.1.4 Data for our invoices
    2. 3.2 Sensor event design
      1. 3.2.1 Existing issues
      2. 3.2.2 Why Kafka is the right fit
      3. 3.2.3 Thought starters on our design
      4. 3.2.4 User data requirements
      5. 3.2.5 High-level plan for applying our questions
      6. 3.2.6 Reviewing our blueprint
    3. 3.3 Format of your data
      1. 3.3.1 Plan for data
      2. 3.3.2 Dependency setup
    4. Summary
    5. References
  12. 4 Producers: Sourcing data
    1. 4.1 An example
      1. 4.1.1 Producer notes
    2. 4.2 Producer options
      1. 4.2.1 Configuring the broker list
      2. 4.2.2 How to go fast (or go safer)
      3. 4.2.3 Timestamps
    3. 4.3 Generating code for our requirements
      1. 4.3.1 Client and broker versions
    4. Summary
    5. References
  13. 5 Consumers: Unlocking data
    1. 5.1 An example
      1. 5.1.1 Consumer options
      2. 5.1.2 Understanding our coordinates
    2. 5.2 How consumers interact
    3. 5.3 Tracking
      1. 5.3.1 Group coordinator
      2. 5.3.2 Partition assignment strategy
    4. 5.4 Marking our place
    5. 5.5 Reading from a compacted topic
    6. 5.6 Retrieving code for our factory requirements
      1. 5.6.1 Reading options
      2. 5.6.2 Requirements
    7. Summary
    8. References
  14. 6 Brokers
    1. 6.1 Introducing the broker
    2. 6.2 Role of ZooKeeper
    3. 6.3 Options at the broker level
      1. 6.3.1 Kafka’s other logs: Application logs
      2. 6.3.2 Server log
      3. 6.3.3 Managing state
    4. 6.4 Partition replica leaders and their role
      1. 6.4.1 Losing data
    5. 6.5 Peeking into Kafka
      1. 6.5.1 Cluster maintenance
      2. 6.5.2 Adding a broker
      3. 6.5.3 Upgrading your cluster
      4. 6.5.4 Upgrading your clients
      5. 6.5.5 Backups
    6. 6.6 A note on stateful systems
    7. 6.7 Exercise
    8. Summary
    9. References
  15. 7 Topics and partitions
    1. 7.1 Topics
      1. 7.1.1 Topic-creation options
      2. 7.1.2 Replication factors
    2. 7.2 Partitions
      1. 7.2.1 Partition location
      2. 7.2.2 Viewing our logs
    3. 7.3 Testing with EmbeddedKafkaCluster
      1. 7.3.1 Using Kafka Testcontainers
    4. 7.4 Topic compaction
    5. Summary
    6. References
  16. 8 Kafka storage
    1. 8.1 How long to store data
    2. 8.2 Data movement
      1. 8.2.1 Keeping the original event
      2. 8.2.2 Moving away from a batch mindset
    3. 8.3 Tools
      1. 8.3.1 Apache Flume
      2. 8.3.2 Red Hat® Debezium™
      3. 8.3.3 Secor
      4. 8.3.4 Example use case for data storage
    4. 8.4 Bringing data back into Kafka
      1. 8.4.1 Tiered storage
    5. 8.5 Architectures with Kafka
      1. 8.5.1 Lambda architecture
      2. 8.5.2 Kappa architecture
    6. 8.6 Multiple cluster setups
      1. 8.6.1 Scaling by adding clusters
    7. 8.7 Cloud- and container-based storage options
      1. 8.7.1 Kubernetes clusters
    8. Summary
    9. References
  17. 9 Management: Tools and logging
    1. 9.1 Administration clients
      1. 9.1.1 Administration in code with AdminClient
      2. 9.1.2 kcat
      3. 9.1.3 Confluent REST Proxy API
    2. 9.2 Running Kafka as a systemd service
    3. 9.3 Logging
      1. 9.3.1 Kafka application logs
      2. 9.3.2 ZooKeeper logs
    4. 9.4 Firewalls
      1. 9.4.1 Advertised listeners
    5. 9.5 Metrics
      1. 9.5.1 JMX console
    6. 9.6 Tracing option
      1. 9.6.1 Producer logic
      2. 9.6.2 Consumer logic
      3. 9.6.3 Overriding clients
    7. 9.7 General monitoring tools
    8. Summary
    9. References
  18. Part 3. Going further
  19. 10 Protecting Kafka
    1. 10.1 Security basics
      1. 10.1.1 Encryption with SSL
      2. 10.1.2 SSL between brokers and clients
      3. 10.1.3 SSL between brokers
    2. 10.2 Kerberos and the Simple Authentication and Security Layer (SASL)
    3. 10.3 Authorization in Kafka
      1. 10.3.1 Access control lists (ACLs)
      2. 10.3.2 Role-based access control (RBAC)
    4. 10.4 ZooKeeper
      1. 10.4.1 Kerberos setup
    5. 10.5 Quotas
      1. 10.5.1 Network bandwidth quota
      2. 10.5.2 Request rate quotas
    6. 10.6 Data at rest
      1. 10.6.1 Managed options
    7. Summary
    8. References
  20. 11 Schema registry
    1. 11.1 A proposed Kafka maturity model
      1. 11.1.1 Level 0
      2. 11.1.2 Level 1
      3. 11.1.3 Level 2
      4. 11.1.4 Level 3
    2. 11.2 The Schema Registry
      1. 11.2.1 Installing the Confluent Schema Registry
      2. 11.2.2 Registry configuration
    3. 11.3 Schema features
      1. 11.3.1 REST API
      2. 11.3.2 Client library
    4. 11.4 Compatibility rules
      1. 11.4.1 Validating schema modifications
    5. 11.5 Alternative to a schema registry
    6. Summary
    7. References
  21. 12 Stream processing with Kafka Streams and ksqlDB
    1. 12.1 Kafka Streams
      1. 12.1.1 KStreams API DSL
      2. 12.1.2 KTable API
      3. 12.1.3 GlobalKTable API
      4. 12.1.4 Processor API
      5. 12.1.5 Kafka Streams setup
    2. 12.2 ksqlDB: An event-streaming database
      1. 12.2.1 Queries
      2. 12.2.2 Local development
      3. 12.2.3 ksqlDB architecture
    3. 12.3 Going further
      1. 12.3.1 Kafka Improvement Proposals (KIPs)
      2. 12.3.2 Kafka projects you can explore
      3. 12.3.3 Community Slack channel
    4. Summary
    5. References
  22. Appendix A. Installation
    1. A.1 Operating system (OS) requirements
    2. A.2 Kafka versions
    3. A.3 Installing Kafka on your local machine
      1. A.3.1 Prerequisite: Java
      2. A.3.2 Prerequisite: ZooKeeper
      3. A.3.3 Prerequisite: Kafka download
      4. A.3.4 Starting a ZooKeeper server
      5. A.3.5 Creating and configuring a cluster by hand
    4. A.4 Confluent Platform
      1. A.4.1 Confluent command line interface (CLI)
      2. A.4.2 Docker
    5. A.5 How to work with the book examples
      1. A.5.1 Building from the command line
    6. A.6 Troubleshooting
    7. References
  23. Appendix B. Client example
    1. B.1 Python Kafka clients
      1. B.1.1 Installing Python
      2. B.1.2 Python producer example
      3. B.1.3 Python consumer
    2. B.2 Client testing
      1. B.2.1 Unit testing in Java
      2. B.2.2 Kafka Testcontainers
    3. References
  24. index

Product information

  • Title: Kafka in Action
  • Author(s): Dylan Scott, VIKTOR GAMOV, Dave Klein
  • Release date: February 2022
  • Publisher(s): Manning Publications
  • ISBN: 9781617295232