Chapter 4. Processing large datasets with lazy workflows

This chapter covers

  • Writing lazy workflows for processing large datasets locally
  • Understanding the lazy behavior of map
  • Writing classes with generators for lazy simulations

In chapter 2 (section 2.1.2, to be exact), I introduced the idea that our beloved map function is lazy by default; that is, it only evaluates when the value is needed downstream. In this chapter, we’ll look at a few of the benefits of laziness, including how we can use laziness to process big data on our laptop. We’ll focus on the benefits of laziness in two contexts:

  1. File processing
  2. Simulations

With file processing, we’ll see that laziness allows us to process much more data than could fit in memory without ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.