© Raju Kumar Mishra and Sundar Rajan Raman 2019
Raju Kumar Mishra and Sundar Rajan RamanPySpark SQL Recipeshttps://doi.org/10.1007/978-1-4842-4335-0_8

8. Structured Streaming

Raju Kumar Mishra1  and Sundar Rajan Raman2
(1)
Bangalore, Karnataka, India
(2)
Chennai, Tamil Nadu, India
 

In this chapter, we look into Apache Spark’s structured streaming feature. So far, we have seen the fluent APIs Apache Spark provides for batch processing data. Typical ETL-based data flows are batch oriented and operate on static data. In this case, this static data has been obtained and is available for processing. While this is one side of the coin, the other side is streaming data. For example, websites such as Twitter and Facebook are continuously fed with data from their ...

Get PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.