January 2019
Beginner to intermediate
154 pages
4h 31m
English
The join() transformation will join two pair RDDs based on their keys. The following example joins data based on the country and returns only the matching records:
//Scalaval salesRDD = spark.sparkContext.parallelize(Array(("US",20),("IND", 30),("UK",10)))val revenueRDD = spark.sparkContext.parallelize(Array(("US",200),("IND", 300)))salesRDD.join(revenueRDD).collect()Output:Array[(String, (Int, Int))] = Array((US,(20,200)), (IND,(30,300)))
There are some more transformations available on pair RDD such as aggregateByKey(), cogroup(), leftOuterJoin(), rightOuterJoin(), subtractByKey(), and more. Some of the special actions include countByKey(), collectAsMap(), and lookup().
Read now
Unlock full access