Chapter 10. Structured Streaming

This chapter will provide a jump-start on the concepts behind Spark Streaming and how this has evolved into Structured Streaming. An important aspect of Structured Streaming is that it utilizes Spark DataFrames. This shift in paradigm will make it easier for Python developers to start working with Spark Streaming.

In this chapter, your will learn:

  • What is Spark Streaming?
  • Why do we need Spark Streaming?
  • What is the Spark Streaming application data flow?
  • Simple streaming application using DStream
  • A quick primer on Spark Streaming global aggregations
  • Introducing Structured Streaming

Note, for the initial sections of this chapter, the example code used will be in Scala, as this was how most Spark Streaming code was written. ...

Get Learning PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.