So far we have seen basic RDD where elements have been words, numbers, or lines of text. We'll now discuss PairRDD, which are essentially datasets of key/value pairs. People who have been using MapReduce will be familiar with the concept of key/value pairs and their benefits during aggregation, joining, sorting, counting, and other ETL operations. The beauty of having key value pairs is that you can operate on data belonging to a particular key in parallel, which includes operations such as aggregation or joining. The simplest example could be retail store sales with
StoreId as the key, and the sales amount as the value. This helps you perform advanced analytics on
StoreId, which can be used to operate the data in parallel.