Mastering Large Datasets with Python

Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce

This chapter covers

Launching and configuring cloud compute clusters with Elastic MapReduce
Running Hadoop jobs in the cloud with mrjob
Distributed cloud machine learning with Spark

Throughout this book, we’ve been talking about the ability to scale code up. We started by looking at how to parallelize code locally; then we moved on to distributed computing frameworks; and finally, in chapter 11, we introduced cloud computing technologies. In this chapter, we’ll look at techniques we can use to work with data of any scale. We’ll see how to take the Hadoop and Spark frameworks we covered in the middle of the book (chapters 7 and 8 for Hadoop; chapters 7, 9, and 10 for Spark) ...

Get Mastering Large Datasets with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Mastering Large Datasets with Python by John Wolohan

Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly