July 2017
Intermediate to advanced
796 pages
18h 55m
English
groupByKey groups the values for each key in the RDD into a single sequence. groupByKey also allows controlling the partitioning of the resulting key-value pair RDD by passing a partitioner. By default, a HashPartitioner is used but a custom partitioner can be given as an argument. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated.
groupByKey can be invoked either using a custom partitioner or just using the default HashPartitioner as shown in the following code snippet:
def groupByKey(partitioner: ...
Read now
Unlock full access