Chapter 14. Stream Processing

Kafka was traditionally seen as a powerful message bus, capable of delivering streams of events but without processing or transformation capabilities. Kafka’s reliable stream delivery capabilities make it a perfect source of data for stream processing systems. Apache Storm, Apache Spark Streaming, Apache Flink, Apache Samza, and many more stream processing systems were built with Kafka often being their only reliable data source.

With the increased popularity of Apache Kafka, first as a simple message bus and later as a data integration system, many companies had a system containing many streams of interesting data, stored for long amounts of time and perfectly ordered, just waiting for some stream processing framework to show up and process them. In other words, in the same way that data processing was significantly more difficult before databases were invented, stream processing was held back by the lack of a stream processing platform.

Starting from version 0.10.0, Kafka does more than provide a reliable source of data streams to every popular stream processing framework. Now Kafka includes a powerful stream processing library as part of its collection of client libraries, called Kafka Streams (or sometimes Streams API). This allows developers to consume, process, and produce events in their own apps, without relying on an external processing framework.

We’ll begin the chapter by explaining what we mean by stream processing (since this term is ...

Get Kafka: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.