Machine Learning in Action

Chapter 15. Big data and MapReduce

This chapter covers

MapReduce
Using Python with Hadoop Streaming
Automating MapReduce with mrjob
Training support vector machines in parallel with the Pegasos algorithm

I often hear “Your examples are nice, but my data is big, man!” I have no doubt that you work with data sets larger than the examples used in this book. With so many devices connected to the internet and people interested in making data-driven decisions, the amount of data we’re collecting has outpaced our ability to process it. Fortunately, a number of open source software projects allow us to process large amounts of data. One project, called Hadoop, is a Java framework for distributing data processing to multiple ...

Get Machine Learning in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Machine Learning in Action by Peter Harrington

Chapter 15. Big data and MapReduce

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly