Chapter 7. Processing truly big datasets with Hadoop and Spark

This chapter covers

  • Recognizing the reduce pattern for N-to-X data transformations
  • Writing helper functions for reductions
  • Writing lambda functions for simple reductions
  • Using reduce to summarize data

In the previous chapters of the book, we’ve focused on developing a foundational set of programming patterns—in the map and reduce style—that allow us to scale our programming. We can use the techniques we’ve covered so far to make the most of our laptop’s hardware. I’ve shown you how to work on large datasets using techniques like map (chapter 2), reduce (chapter 5), parallelism (chapter 2), and lazy programming (chapter 4). In this chapter, we begin to look at working on big ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.