Skip to Content
Mastering Large Datasets with Python
book

Mastering Large Datasets with Python

by John Wolohan
January 2020
Intermediate to advanced content levelIntermediate to advanced
312 pages
10h 22m
English
Manning Publications
Content preview from Mastering Large Datasets with Python

Chapter 2. Accelerating large dataset work: Map and parallel computing

This chapter covers

  • Using map to transform lots of data
  • Using parallel programming to transform lots of data
  • Scraping data from the web in parallel with map

In this chapter, we’ll look at map and how to use it for parallel programming, and we’ll apply those concepts to complete two web scraping exercises. With map, we’ll focus on three primary capabilities:

  1. We can use it to replace for loops.
  2. We can use it to transform data.
  3. Map evaluates only when necessary, not when called.

These core ideas about map are also why it’s so useful for us in parallel programming. In parallel programming, we’re using multiple processing units to do partial work on a task and combining ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics with Spark Using Python, First edition

Data Analytics with Spark Using Python, First edition

Jeffrey Aven

Publisher Resources

ISBN: 9781617296239Publisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link