Chapter 2. Accelerating large dataset work: Map and parallel computing

This chapter covers

  • Using map to transform lots of data
  • Using parallel programming to transform lots of data
  • Scraping data from the web in parallel with map

In this chapter, we’ll look at map and how to use it for parallel programming, and we’ll apply those concepts to complete two web scraping exercises. With map, we’ll focus on three primary capabilities:

  1. We can use it to replace for loops.
  2. We can use it to transform data.
  3. Map evaluates only when necessary, not when called.

These core ideas about map are also why it’s so useful for us in parallel programming. In parallel programming, we’re using multiple processing units to do partial work on a task and combining ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.