Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

  • Advanced parallelization with map and starmap
  • Writing parallel reduce and map reduce patterns
  • Accumulation and combination functions

We ended chapter 5 with a paradoxical situation: using a parallel method and more compute resources was slower than a linear approach with fewer compute resources. Intuitively, we know this is wrong. If we’re using more resources, we should at the very least be as fast as our low-resource effort—hopefully we’re faster. We never want to be slower.

In this chapter, we’ll take a look at how to get the most out of parallelization in two ways:

  1. By optimizing our use of parallel map
  2. By using a parallel reduce

Parallel map

Get Mastering Large Datasets with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.