July 2017
Intermediate to advanced
796 pages
18h 55m
English
coalesce applies a transformation function to input partitions to combine the input partitions into fewer partitions in the output RDD.
As shown in the following code snippet, this is how we can combine all partitions to a single partition:
scala> val rdd_two = sc.textFile("wiki1.txt")rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24scala> rdd_two.partitions.lengthres21: Int = 2scala> val rdd_three = rdd_two.coalesce(1)rdd_three: org.apache.spark.rdd.RDD[String] = CoalescedRDD[21] at coalesce at <console>:26scala> rdd_three.partitions.lengthres22: Int = 1
The following diagram explains how coalesce works. You can see that a new RDD is created from the original RDD essentially reducing ...
Read now
Unlock full access