July 2017
Intermediate to advanced
796 pages
18h 55m
English
repartition applies a transformation function to input partitions to repartition the input into fewer or more output partitions in the output RDD.
As shown in the following code snippet, this is how we can map an RDD of a text file to an RDD with more partitions:
scala> val rdd_two = sc.textFile("wiki1.txt")rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24scala> rdd_two.partitions.lengthres21: Int = 2scala> val rdd_three = rdd_two.repartition(5)rdd_three: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[25] at repartition at <console>:26scala> rdd_three.partitions.lengthres23: Int = 5
The following diagram explains how repartition works. You can see that a new RDD is created ...
Read now
Unlock full access