Mastering Large Datasets with Python

Chapter 4. Processing large datasets with lazy workflows

This chapter covers

Writing lazy workflows for processing large datasets locally
Understanding the lazy behavior of map
Writing classes with generators for lazy simulations

In chapter 2 (section 2.1.2, to be exact), I introduced the idea that our beloved map function is lazy by default; that is, it only evaluates when the value is needed downstream. In this chapter, we’ll look at a few of the benefits of laziness, including how we can use laziness to process big data on our laptop. We’ll focus on the benefits of laziness in two contexts:

File processing
Simulations

With file processing, we’ll see that laziness allows us to process much more data than could fit in memory without ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Large Datasets with Python by John Wolohan

Chapter 4. Processing large datasets with lazy workflows

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly