Pair RDD
Pair RDDs are RDDs consisting of key-value tuples which suits many use cases such as aggregation, sorting, and joining data. The keys and values can be simple types such as integers and strings or more complex types such as case classes, arrays, lists, and other types of collections. The key-value based extensible data model offers many advantages and is the fundamental concept behind the MapReduce paradigm.
Creating a PairRDD can be done easily by applying transformation to any RDD to convert the RDD to an RDD of key-value pairs.
Let's read the statesPopulation.csv into an RDD using the SparkContext, which is available as sc.
The following is an example of a basic RDD of the state population and how PairRDD looks like for the same ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access