Advanced concepts of Spark Streaming

Let's go through some of the important advanced concepts of Spark Streaming.

Using DataFrames

We learned Spark SQL and DataFrames in Chapter 4, Big Data Analytics with Spark SQL, DataFrames, and Datasets. There are many use cases where you want to convert DStream and DataFrame to do interactive analytics. RDDs generated by DStreams can be converted to DataFrames and queried with SQL internally within the program or from external SQL clients as well. Refer to the sql_network_wordcount.py program in /usr/lib/spark/examples/lib/streaming for implementing SQL in a Spark Streaming application. You can also start JDBC server within the application with the following code:

HiveThriftServer2.startWithContext(hiveContext) ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.