O'Reilly logo

Python Essentials by Steven F. Lott

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Plugging into a MapReduce framework

For background on the Apache Hadoop server, see https://hadoop.apache.org. Here's the summary:

The Apache Hadoop software library is a framework that allows for the distributed processing of large datasets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

One part of the Hadoop distributed processing is the MapReduce module. This module allows us to decompose analysis of data into two complementary operations: map and reduce. These operations are distributed around the Hadoop cluster to be run concurrently. A map operation processes all of the rows of datasets that are scattered around ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required