aggregateByKey
aggregateByKey is quite similar to reduceByKey, except that aggregateByKey allows more flexibility and customization of how to aggregate within partitions and between partitions to allow much more sophisticated use cases such as generating a list of all <Year, Population> pairs as well as total population for each State in one function call.
aggregateByKey works by aggregating the values of each key, using given combine functions and a neutral initial/zero value. This function can return a different result type, U, than the type of the values in this RDD V, which is the biggest difference. Thus, we need one operation for merging a V into a U and one operation for merging two U's. The former operation is used for merging values ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access