This chapter discusses how to use Spark’s streaming API to process real-time data. The first part focuses on the main difference between streaming and batch data, in addition to their specific applications. The second section provides details on the Structured Streaming API and its various improvements over previous RDD-based Spark streaming APIs. The final section includes the code to use for Structured Streaming on incoming data and discusses how to save the output results in memory. We’ll also look at an alternative to Structured Streaming.
Batch vs. Stream
Perhaps most readers of this book ...