Skip to Content
Mastering Large Datasets with Python
book

Mastering Large Datasets with Python

by John Wolohan
January 2020
Intermediate to advanced content levelIntermediate to advanced
312 pages
10h 22m
English
Manning Publications
Content preview from Mastering Large Datasets with Python

Chapter 4. Processing large datasets with lazy workflows

This chapter covers

  • Writing lazy workflows for processing large datasets locally
  • Understanding the lazy behavior of map
  • Writing classes with generators for lazy simulations

In chapter 2 (section 2.1.2, to be exact), I introduced the idea that our beloved map function is lazy by default; that is, it only evaluates when the value is needed downstream. In this chapter, we’ll look at a few of the benefits of laziness, including how we can use laziness to process big data on our laptop. We’ll focus on the benefits of laziness in two contexts:

  1. File processing
  2. Simulations

With file processing, we’ll see that laziness allows us to process much more data than could fit in memory without ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Analytics with Spark Using Python, First edition

Data Analytics with Spark Using Python, First edition

Jeffrey Aven

Publisher Resources

ISBN: 9781617296239Publisher SupportOtherPublisher WebsiteSupplemental ContentPurchase Link