Chapter 15. Experimental Areas: Continuous Processing and Machine Learning

Structured Streaming first appeared in Spark 2.0 as an experimental API, offering a new streaming model aimed at simplifying the way we think about streaming applications. In Spark 2.2, Structured Streaming “graduated” to production-ready, giving the signal that this new model was all set for industry adoption. With Spark 2.3, we saw further improvements in the area of streaming joins, and it also introduced a new experimental continuous execution model for low-latency stream processing.

As with any new successful development, we can expect Structured Streaming to keep advancing at a fast pace. Although industry adoption will contribute evolutionary feedback on important features, market trends such as the increasing popularity of machine learning will drive the roadmap of the releases to come.

In this chapter, we want to provide insights into some of the areas under development that will probably become mainstream in upcoming releases.

Continuous Processing

Continuous processing is an alternative execution mode for Structured Streaming that allows for low-latency processing of individual events. It has been included as an experimental feature in Spark v2.3 and is still under active development, in particular in the areas of delivery semantics, stateful operation support, and monitoring.

Understanding Continuous Processing

The initial streaming API for Spark, Spark Streaming, was conceived with the ...

Get Stream Processing with Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.