Chapter 7. Apache Kafka Streams Implementation

As described earlier, unlike Flink, Beam, and Spark, Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds upon important stream-processing concepts, such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state. Because Kafka Streams is a Java Virtual Machine (JVM) framework, implementation of our architecture is fairly simple and straightforward. Because it runs in a single JVM, it is possible to implement state as a static Java object in memory, which can be shared by both streams. This in turn means that it is not necessary to merge streams—they can both execute independently while sharing the same state, which is the current model.

Although this solution will work, it has a serious drawback: unlike engines such as Spark or Flink, it does not provide automated recovery. Fortunately, Kafka Streams allows for implementation of state using custom state stores, which are backed up by a special Kafka topic. I will start this chapter by showing you how you can implement such a state store. Technically it is possible to use the key–value store provided by Kafka Streams, but I decided to create a custom state server to show implementation details.

Implementing the Custom State Store

Implementing the custom state store begins with deciding on the data that it needs to hold. For our implementation, the only thing ...

Get Serving Machine Learning Models now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.