Part III. Spark Streaming

In this part, we are going to learn about Spark Streaming.

Spark Streaming was the first streaming API offered on Apache Spark and is currently used in production by many companies around the world. It provides a powerful and extensible functional API based on the core Spark abstractions. Nowadays, Spark Streaming is mature and stable.

Our exploration of Spark Streaming begins with a practical example that provides us with an initial feeling of its API usage and programming model. As we progress through this part, we explore the different aspects involved in the programming and execution of robust Spark Streaming applications:

  • Understanding the Discretized Stream (DStream) abstraction

  • Creating applications using the API and programming model

  • Consuming and producing data using streaming sources and Output Operations

  • Combining SparkSQL and other libraries into streaming applications

  • Understanding the fault-tolerance characteristics and how to create robust applications

  • Monitoring and managing streaming applications

After this part, you will have the knowledge required to design, implement, and execute stream-processing applications using Spark Streaming. We will also be prepared for Part IV, in which we cover more advanced topics like the application of probabilistic data structures for stream processing and online machine learning.

Get Stream Processing with Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.