O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running the WordCount program in a distributed cluster environment

This recipe describes how to run a MapReduce computation in a distributed Hadoop v2 cluster.

Getting ready

Start the Hadoop cluster by following the Setting up HDFS recipe or the Setting up Hadoop ecosystem in a distributed cluster environment using a Hadoop distribution recipe.

How to do it...

Now let's run the WordCount sample in the distributed Hadoop v2 setup:

  1. Upload the wc-input directory in the source repository to the HDFS filesystem. Alternatively, you can upload any other set of text documents as well.
    $ hdfs dfs -copyFromLocal wc-input .
    
  2. Execute the WordCount example from the HADOOP_HOME directory:
    $ hadoop jar hcb-c1-samples.jar \
    chapter1.WordCount \
    wc-input wc-output

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required