March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will change the job that was performing the join on non-partitioned data. We'll be changing the design of jobs with wide dependencies.
In this section, we will cover the following topics:
We will be using the repartition method on the DataFrame using a common partition key. We saw that when issuing a join, repartitioning happens underneath. But often, when using Spark, we want to execute multiple operations on the DataFrame. So, when we perform the join with other datasets, hashPartitioning will need to be executed once again. If ...
Read now
Unlock full access