February 2017
Intermediate to advanced
274 pages
5h 58m
English
As noted in Chapter 1, Understanding Spark, one of the primary reasons the Spark SQL engine is so fast is because of the Catalyst Optimizer. For readers with a database background, this diagram looks similar to the logical/physical planner and cost model/cost-based optimization of a relational database management system (RDBMS):

The significance of this is that, as opposed to immediately processing the query, the Spark engine's Catalyst Optimizer compiles and optimizes a logical plan and has a cost optimizer that determines the most efficient physical plan generated.
As noted in earlier chapters, while the Spark SQL ...
Read now
Unlock full access