Chapter 3. Function pipelines for mapping complex transformations

This chapter covers

  • Using map to do complex data transformations
  • Chaining together small functions into pipelines
  • Applying these pipelines in parallel on large datasets

In the last chapter, we saw how you can use map to replace for loops and how using map makes parallel computing straightforward: a small modification to map, and Python will take care of the rest. But so far with map, we’ve been working with simple functions. Even in the Wikipedia scraping example from chapter 2, our hardest working function only pulled text off the internet. If we want to make parallel programming really useful, we’ll want to use map in more complex ways. This chapter introduces how to do ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.