O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Executing a Pig script using EMR

Amazon EMR supports executing Apache Pig scripts on the data stored in S3. Refer to the Pig-related recipes in Chapter 7, Hadoop Ecosystem II – Pig, HBase, Mahout, and Sqoop, for more details on using Apache Pig for data analysis.

In this recipe, we are going to execute a simple Pig script using Amazon EMR. This sample will use the Human Development Reports data (http://hdr.undp.org/en/statistics/data/) to print names of countries that have a GNI value greater than $2000 of gross national income per capita (GNI) sorted by GNI.

How to do it...

The following steps show you how to use a Pig script with Amazon Elastic MapReduce to process a dataset stored on Amazon S3:

  1. Use the Amazon S3 console to create a bucket in S3 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required