July 2017
Intermediate to advanced
796 pages
18h 55m
English
RangePartitioner works by partitioning the RDD into roughly equal ranges. Since the range has to know the starting and ending keys for any partition, the RDD needs to be sorted first before a RangePartitioner can be used.
RangePartitioning first needs reasonable boundaries for the partitions based on the RDD and then create a function from key K to the partitionIndex where the element belongs. Finally, we need to repartition the RDD, based on the RangePartitioner to distribute the RDD elements correctly as per the ranges we determined.
The following is an example of how we can use RangePartitioning of a PairRDD. We also can see how the partitions changed after we repartition the RDD using a RangePartitioner:
import org.apache.spark.RangePartitioner ...
Read now
Unlock full access