O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Using Hadoop with legacy applications – Hadoop streaming

Hadoop streaming allows us to use any executable or a script as the Mapper or the Reducer of a Hadoop MapReduce job. Hadoop streaming enables us to perform rapid prototyping of the MapReduce computations using Linux shell utility programs or using scripting languages. Hadoop streaming also allows the users with some or no Java knowledge to utilize Hadoop to process data stored in HDFS.

In this recipe, we implement a Mapper for our HTTP log processing application using Python and use a Hadoop aggregate-package-based Reducer.

How to do it...

The following are the steps to use a Python program as the Mapper to process the HTTP server log files:

  1. Write the logProcessor.py python script:
    #!/usr/bin/python ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required