January 2019
Beginner to intermediate
154 pages
4h 31m
English
Wide transformations involve a shuffle of the data between the partitions. The groupByKey(), reduceByKey(), join(), distinct(), and intersect() are some examples of wide transformations. In the case of these transformations, the result will be computed using data from multiple partitions and thus requires a shuffle. Wide transformations are similar to the shuffle-and-sort phase of MapReduce. Let's understand the concept with the help of the following example:

We have an RDD-A and we perform a wide transformation such as groupByKey()and we get a new RDD-B with fewer partitions. RDD-B will have data ...
Read now
Unlock full access