Chapter 15. Big data and MapReduce

 

This chapter covers
  • MapReduce
  • Using Python with Hadoop Streaming
  • Automating MapReduce with mrjob
  • Training support vector machines in parallel with the Pegasos algorithm

 

I often hear “Your examples are nice, but my data is big, man!” I have no doubt that you work with data sets larger than the examples used in this book. With so many devices connected to the internet and people interested in making data-driven decisions, the amount of data we’re collecting has outpaced our ability to process it. Fortunately, a number of open source software projects allow us to process large amounts of data. One project, called Hadoop, is a Java framework for distributing data processing to multiple ...

Get Machine Learning in Action now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.