Appendix BHadoop 1.x Quick Start

Chapter 10 covers Hadoop, and the example is based on Hadoop's MapReduce version 1 engine (MR1). The instructions in this appendix are for the MR1 engine. You might want to consider the Hadoop 2 version, which has performance improvements; installation is very similar to the following steps.

If you are running Mac OS X or a Unix operating system (for example, Linux) then the following instructions apply.

Downloading and Installing Hadoop

You can download Hadoop from one of the Apache mirror sites. I used http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/.

Look for the hadoop-1.2.x release and download the .tar.gz, where x is the highest number. There are extra links to “stable1,” “stable2,” and “current,” that help guide you to the correct version.

tar xvzf hadoop-x.x.x-bin.tar.gz

In the configuration directory (called conf), edit the hadoop-env.sh file and edit the JAVA_HOME line:

export JAVA_HOME=/path/to/wherever/your-java/is

Also (this is a part that's very rarely mentioned), if your SSH configuration has a different port set up (the default port is 22, but the paranoid ...

Get Machine Learning: Hands-On for Developers and Technical Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.