©  Raju Kumar Mishra 2018
Raju Kumar MishraPySpark Recipeshttps://doi.org/10.1007/978-1-4842-3141-8_5

5. The Power of Pairs: Paired RDDs

Raju Kumar Mishra1 
(1)
Bangalore, Karnataka, India
 

Key/value pairs are good for solving many problems efficiently in a parallel fashion. Apache Mahout, a machine-learning library that was initially developed on top of Apache Hadoop, implements many machine-learning algorithms in the areas of classification, clustering, and collaborative filtering by using the MapReduce key/value-pair architecture . In this chapter, you’ll work through recipes that develop skills for solving interesting big data problems from many disciplines.

This chapter covers the following recipes:
  • Recipe 5-1. Create a paired RDD

  • Recipe 5-2. Perform ...

Get PySpark Recipes: A Problem-Solution Approach with PySpark2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.