Appendix A. Manual Installation
In this appendix, we cover the details of installing the tools for the stack used in this book.
Installing Hadoop
You can download the latest version of Hadoop from the Apache Hadoop downloads page. At the time of writing, the latest Hadoop was 2.7.3, but this will probably have changed by the time you’re reading this.
A recipe for a headless install of Hadoop is available in manual_install.sh. In addition to
downloading and unpackaging Hadoop, we also need to set up our Hadoop
environment variables (HADOOP_HOME
,
HADOOP_CLASSPATH
, and HADOOP_CONF_DIR
), and we need to put Hadoop’s
executables in our PATH
. First, set up
a PROJECT_HOME
variable to help find
the right paths. You will need to set this yourself by editing your
.bash_profile file:
export
PROJECT_HOME
=
/Users/rjurney/Software/Agile_Data_Code_2
Now we can set up our environment directly. Here is the relevant section of manual_install.sh:
# May need to update this link... see http://hadoop.apache.org/releases.html curl -Lko /tmp/hadoop-2.7.3.tar.gz \ http://apache.osuosl.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz mkdir hadoop tar -xvf /tmp/hadoop-2.7.3.tar.gz -C hadoop --strip-components=1 echo '# Hadoop environment setup' >> ~/.bash_profile export HADOOP_HOME=$PROJECT_HOME/hadoop echo 'export HADOOP_HOME=$PROJECT_HOME/hadoop' >> ~/.bash_profile export PATH=$PATH:$HADOOP_HOME/bin echo 'export PATH=$PATH:$HADOOP_HOME/bin' >> ~/.bash_profile export HADOOP_CLASSPATH=$(hadoop classpath) echo 'export ...
Get Agile Data Science 2.0 now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.