July 2017
Intermediate to advanced
796 pages
18h 55m
English
CoGroupedRDD is an RDD that cogroups its parents. Both parent RDDs have to be pairRDDs for this to work, as a cogroup essentially generates a pairRDD consisting of the common key and list of values from both parent RDDs. Take a look at the following code snippet:
class CoGroupedRDD[K] extends RDD[(K, Array[Iterable[_]])]
The following is an example of a CoGroupedRDD where we create a cogroup of two pairRDDs, one having pairs of State, Population and the other having pairs of State, Year:
scala> val pairRDD = statesPopulationRDD.map(record => (record.split(",")(0), record.split(",")(2)))pairRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[60] at map at <console>:27scala> val pairRDD2 = statesPopulationRDD.map(record ...Read now
Unlock full access