Appendix B. Hadoop Cluster Configuration Scripts

To speed up the creation of Hadoop clusters in the cloud, you can create an image for one or more representative cluster instances with software already installed and, for the most part, configured. Chapter 16 describes the process. Still, there are configuration steps that can only be done once the cluster instances are running, and the scripts here can automate the work.

A quick glance at the scripts should reveal that even this small slice of automation is not trivial. You might consider borrowing the techniques used here and implementing them using a different scripting language or framework. Central to them is the ability to establish SSH connections into and within the cluster, so look for frameworks that help in that regard, such as Fabric.

Tip

Code is available at this book’s code repository.

SSH Key Creation and Distribution

The Hadoop installation process in Chapter 9 and elsewhere includes the creation of user accounts specific to services like HDFS, YARN, and ZooKeeper. Those accounts can already be in place in an image, but for greater security they should be configured with unique SSH keys, so that instances in one cluster cannot access instances in other clusters that were built from the same image.

The Bash script in Example B-1 can be run from your local computer to orchestrate the creation of SSH key pairs on the manager instance of a Hadoop cluster for each Hadoop account, and the distribution of public ...

Get Moving Hadoop to the Cloud now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.