©  Raju Kumar Mishra 2018
Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_5

5. The Power of Pairs: Paired RDDs

Raju Kumar Mishra1 
Bangalore, Karnataka, India

Key/value pairs are good for solving many problems efficiently in a parallel fashion. Apache Mahout, a machine-learning library that was initially developed on top of Apache Hadoop, implements many machine-learning algorithms in the areas of classification, clustering, and collaborative filtering by using the MapReduce key/value-pair architecture . In this chapter, you’ll work through recipes that develop skills for solving interesting big data problems from many disciplines.

This chapter covers the following recipes:
  • Recipe 5-1. Create a paired RDD

  • Recipe 5-2. Perform ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.