Appendix C. Installation and Setup

The application built throughout this book makes use of the open source software Java, Hadoop, Pig, and Hive. Many of these software components are preinstalled and configured in Amazon EMR as well as the other AWS services used in examples. However, to build and test many of the examples in this book, you many find it easier or more in line with your own organizational policies to install these components locally. For the Java MapReduce jobs, you will be required to install Java locally to develop the MapReduce application.

This appendix covers the installation and setup of these software components to help prepare you for developing the components covered in the book.


Many of the book’s examples (and Hadoop itself) are written in Java. To use Hadoop and build the examples in this book, you will need to have Java installed. The examples in this book were built using the Oracle Java Development Kit. There are now many variations of the Java JDK available from OpenJDK to GNU Java. The code examples may work with these, but the Oracle JDK is still widely available, free, and the most widely used due to the long history of development of Java under Sun prior to Oracle purchasing the rights to Java. Depending on the Job Flow type you are creating and which packages you want to install locally, you may need multiple versions of Java installed. Also, a local installation of Pig and Hadoop will require Java v1.6 or greater.

Hadoop and many of ...

Get Programming Elastic MapReduce now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.