Chapter 2. Apache Kafka Basics

Connect is one of the components of the Apache Kafka project. In this chapter, we give a quick overview of how Kafka works and the concepts you should be familiar with in order to fully understand the rest of this book. We also discuss the different Kafka clients, including Kafka Streams, and show you how to run them against a local Kafka cluster. You will likely need to run Kafka and related clients in your development environment, even if you have someone else running your Kafka cluster in production.

If you already have a good understanding of Kafka, you can skip this chapter and go directly to Chapter 3. If you want a deeper dive into Apache Kafka, we recommend you take a look at Kafka: The Definitive Guide (O’Reilly), by Gwen Shapira et al.

A Distributed Event Streaming Platform

On the official website, Kafka is described as an “open-source distributed event streaming platform.” While that’s a technically accurate description, most people need more detail in order to understand what that means, what Kafka is, and what you can use it for. Let’s look at each part of that description and explain what it means.

Open Source

Due to its openness, many third-party tools and integrations have been created by the ever-growing Kafka community.

The project was originally created at LinkedIn, where they needed a performant and flexible messaging system to process the very large amount of data generated by their users. It was released as an open source ...

Get Kafka Connect now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.