September 2021
Beginner to intermediate
452 pages
8h 46m
English
To write efficient Spark applications, we need to have some understanding of how Spark executes queries. Having a good understanding of how Spark executes a given query helps big data developers/engineers work efficiently with large volumes of data.
Query execution is a very broad subject, and, in this chapter, we will start by understanding jobs, stages, and tasks. Then, we will learn how Spark lazy evaluation works. Following this, we will learn how to check and understand the execution plan when working with DataFrames or SparkSQL. Later, we will learn how joins work in Spark and the different types of join algorithms Spark uses while joining two tables. Finally, we will learn about the input, ...
Read now
Unlock full access