February 2017
Intermediate to advanced
274 pages
5h 58m
English
With Spark 2.0, the Apache Spark community is working on simplifying streaming by introducing the concept of structured streaming which bridges the concepts of streaming with Datasets/DataFrames (as noted in the following diagram):

As noted in earlier chapters on DataFrames, the execution of SQL and/or DataFrame queries within the Spark SQL Engine (and Catalyst Optimizer) revolves around building a logical plan, building numerous physical plans, the engine choosing the correct physical plan based on its cost optimizer, and then generating the code (i.e. code gen) that will deliver the results in a performant manner. ...
Read now
Unlock full access