Mastering Large Datasets with Python

Chapter 6. Speeding up map and reduce with advanced parallelization

This chapter covers

Advanced parallelization with map and starmap
Writing parallel reduce and map reduce patterns
Accumulation and combination functions

We ended chapter 5 with a paradoxical situation: using a parallel method and more compute resources was slower than a linear approach with fewer compute resources. Intuitively, we know this is wrong. If we’re using more resources, we should at the very least be as fast as our low-resource effort—hopefully we’re faster. We never want to be slower.

In this chapter, we’ll take a look at how to get the most out of parallelization in two ways:

By optimizing our use of parallel map
By using a parallel reduce

Parallel map

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Large Datasets with Python by John Wolohan

Chapter 6. Speeding up map and reduce with advanced parallelization

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly