Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

  • Advanced parallelization with map and starmap
  • Writing parallel reduce and map reduce patterns
  • Accumulation and combination functions

We ended chapter 5 with a paradoxical situation: using a parallel method and more compute resources was slower than a linear approach with fewer compute resources. Intuitively, we know this is wrong. If we’re using more resources, we should at the very least be as fast as our low-resource effort—hopefully we’re faster. We never want to be slower.

In this chapter, we’ll take a look at how to get the most out of parallelization in two ways:

  1. By optimizing our use of parallel map
  2. By using a parallel reduce

Parallel map

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.