Appendix A. Installing Apache Hadoop

It’s easy to install Hadoop on a single machine to try it out. (For installation on a cluster, please refer to Chapter 9.) The quickest way is to download and run a binary release from an Apache Software Foundation Mirror.

In this appendix, we cover how to install Hadoop Common, HDFS, and MapReduce. Instructions for installing the other projects covered in this book are included at the start of the relevant chapter.


Hadoop is written in Java, so you will need to have Java installed on your machine, version 6 or later. Sun’s JDK is the one most widely used with Hadoop, although others have been reported to work.

Hadoop runs on Unix and on Windows. Linux is the only supported production platform, but other flavors of Unix (including Mac OS X) can be used to run Hadoop for development. Windows is only supported as a development platform, and additionally requires Cygwin to run. During the Cygwin installation process, you should include the openssh package if you plan to run Hadoop in pseudo-distributed mode (see following explanation).


Start by deciding which user you’d like to run Hadoop as. For trying out Hadoop or developing Hadoop programs, it is simplest to run Hadoop on a single machine using your own user account.

Download a stable release, which is packaged as a gzipped tar file, from the Apache Hadoop releases page and unpack it somewhere on your filesystem:

% tar xzf hadoop-x.y.z.tar.gz

Before you can run Hadoop, you ...

Get Hadoop: The Definitive Guide, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.