Chapter 29. Other Distributed Real-Time Stream Processing Systems

As we have demonstrated throughout this book, stream processing is a crucial technology for every data-oriented enterprise. There are many stream-processing stacks out there that can help us in the task of processing streaming data, both proprietary and in the open source domain. They differ in capabilities, APIs, and offer different trade-offs in the balance between latency and throughput.

Following the principle of the right tool for the job, they should be compared and contrasted against the requirements of every new project to make the right choice.

Furthermore, the evolving importance of the cloud beyond being an infrastructure provider has created a new class of offerings, where the functionality of the system is offered as a managed service (Software as a Service [SAAS]).

In this chapter, we are going to briefly survey the most relevant open source stream processors currently maintained, such as Apache Storm, Apache Flink, Apache Beam, and Kafka Streams and offer an overview of the offering of the dominant cloud providers in the streaming arena.

Apache Storm

Apache Storm is an open source project created originally by Nathan Marz at BackType. It was then used at Twitter and open sourced in 2011, and consists of a mix of Java and Closure code. It’s an open source, distributed, real-time computation system. It was the first “big data” streaming engine to be fast, scalable, and partially fault-tolerant, ...

Get Stream Processing with Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.