Chapter 9. Standing Up a Cluster
Now that you have instances up and running in the cloud provider of your choice, they can be set up to run a Hadoop cluster. If you donât have instances at the ready and want to follow along, then go back to ChapterÂ 6 for AWS, ChapterÂ 7 for Google Cloud Platform, or ChapterÂ 8 for Azure first, and then return here.
Hadoop requires a Java runtime to work, and so Java must be installed on each of your new instances. A good strategy is to use the operating system package management capability already on the instances, e.g.,
yum on Red Hat Linux,
apt on Ubuntu. Cloud providers ensure that these capabilities work within their infrastructures, sometimes even providing local mirrors or gateways to help.
TableÂ 9-1 suggests packages to install for some operating systems. As new versions of Java are released, the package names will change.
Debian or Ubuntu
openjdk-8-jdk or openjdk-7-jdk
Red Hat or CentOS
java-1.8.0-openjdk or java-1.7.0-openjdk
Instead of using a package available natively for your operating system, you can install an Oracle JDK by downloading an installation package directly from Oracle. Since you have root access to your instances, you are free to use whatever means you prefer to install Java.
After you have installed Java, make note of where the Java home directory is (i.e., what the
JAVA_HOME environment variable should be set to). You will need to know this ...
Get Moving Hadoop to the Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.