Appendix A. Manual Installation

In this appendix, we cover the details of installing the tools for the stack used in this book.

Installing Hadoop

You can download the latest version of Hadoop from the Apache Hadoop downloads page. At the time of writing, the latest Hadoop was 2.7.3, but this will probably have changed by the time you’re reading this.

A recipe for a headless install of Hadoop is available in manual_install.sh. In addition to downloading and unpackaging Hadoop, we also need to set up our Hadoop environment variables (HADOOP_HOME, HADOOP_CLASSPATH, and HADOOP_CONF_DIR), and we need to put Hadoop’s executables in our PATH. First, set up a PROJECT_HOME variable to help find the right paths. You will need to set this yourself by editing your .bash_profile file:

export PROJECT_HOME=/Users/rjurney/Software/Agile_Data_Code_2

Now we can set up our environment directly. Here is the relevant section of manual_install.sh:

# May need to update this link... see http://hadoop.apache.org/releases.html curl -Lko /tmp/hadoop-2.7.3.tar.gz \ http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz mkdir hadoop tar -xvf /tmp/hadoop-2.7.3.tar.gz -C hadoop --strip-components=1 echo '# Hadoop environment setup' >> ~/.bash_profile export HADOOP_HOME=$PROJECT_HOME/hadoop echo 'export HADOOP_HOME=$PROJECT_HOME/hadoop' >> ~/.bash_profile export PATH=$PATH:$HADOOP_HOME/bin echo 'export PATH=$PATH:$HADOOP_HOME/bin' >> ~/.bash_profile export HADOOP_CLASSPATH=$(hadoop classpath) echo 'export ...

Get Agile Data Science 2.0 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.