Skip to Content
Mastering Large Datasets with Python
book

Mastering Large Datasets with Python

by John Wolohan
January 2020
Intermediate to advanced content levelIntermediate to advanced
312 pages
10h 22m
English
Manning Publications
Content preview from Mastering Large Datasets with Python

Chapter 3. Function pipelines for mapping complex transformations

This chapter covers

  • Using map to do complex data transformations
  • Chaining together small functions into pipelines
  • Applying these pipelines in parallel on large datasets

In the last chapter, we saw how you can use map to replace for loops and how using map makes parallel computing straightforward: a small modification to map, and Python will take care of the rest. But so far with map, we’ve been working with simple functions. Even in the Wikipedia scraping example from chapter 2, our hardest working function only pulled text off the internet. If we want to make parallel programming really useful, we’ll want to use map in more complex ways. This chapter introduces how to do ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics with Spark Using Python, First edition

Data Analytics with Spark Using Python, First edition

Jeffrey Aven

Publisher Resources

ISBN: 9781617296239Publisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link