P. SinghLearn PySparkhttps://doi.org/10.1007/978-1-4842-4961-1_3

3. Spark Structured Streaming

Pramod Singh¹

(1)

Bangalore, Karnataka, India

This chapter discusses how to use Spark’s streaming API to process real-time data. The first part focuses on the main difference between streaming and batch data, in addition to their specific applications. The second section provides details on the Structured Streaming API and its various improvements over previous RDD-based Spark streaming APIs. The final section includes the code to use for Structured Streaming on incoming data and discusses how to save the output results in memory. We’ll also look at an alternative to Structured Streaming.

Batch vs. Stream

Perhaps most readers of this book ...

Get Learn PySpark: Build Python-based Machine Learning and Deep Learning Models now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learn PySpark: Build Python-based Machine Learning and Deep Learning Models by Pramod Singh

3. Spark Structured Streaming

Batch vs. Stream

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly