Mastering Large Datasets with Python

Chapter 7. Processing truly big datasets with Hadoop and Spark

This chapter covers

Recognizing the reduce pattern for N-to-X data transformations
Writing helper functions for reductions
Writing lambda functions for simple reductions
Using reduce to summarize data

In the previous chapters of the book, we’ve focused on developing a foundational set of programming patterns—in the map and reduce style—that allow us to scale our programming. We can use the techniques we’ve covered so far to make the most of our laptop’s hardware. I’ve shown you how to work on large datasets using techniques like map (chapter 2), reduce (chapter 5), parallelism (chapter 2), and lazy programming (chapter 4). In this chapter, we begin to look at working on big ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Large Datasets with Python by John Wolohan

Chapter 7. Processing truly big datasets with Hadoop and Spark

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly