Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

CHAPTER 4

Processing Data with Map Reduce

Hadoop Map Reduce is a system for parallel processing of very large data sets using distributed fault-tolerant storage over very large clusters. The input data set is broken down into pieces, which are the inputs to the Map functions. The Map functions then filter and sort these data chunks (whose size is configurable) on the Hadoop cluster data nodes. The output of the Map processes is delivered to the Reduce processes, which shuffle and summarize the data to produce the resulting output.

This chapter explores Map Reduce programming through multiple implementations of a simple, but flexible word-count ...

Get Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset by Michael Frampton

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly