In this chapter you’ll create a simple Hadoop cluster running in Google Compute Engine (GCE), the service in Google Cloud Platform that enables you to provision instances. The cluster will consist of basic installations of HDFS and YARN, which form the foundation for running MapReduce and other analytic workloads.
This chapter assumes you are using a Unix-like operating system on your local computer, such as Linux or macOS. If you are using Windows, some of the steps will vary, particularly those for working with SSH.
If you just worked through the previous chapter on AWS, you’ll find that this chapter covers the same procedures, just under Google Cloud Platform. If you’re more interested in using your AWS cluster, skip ahead to Chapter 9.
Before you start, you will need to have an account already established with Google Cloud Platform. You can use your current Google account, or register for a separate account for free.
Once you are registered, you will be able to log in to the Google Cloud Platform console, a web interface for using all of the different services under the Google Cloud Platform umbrella. When you log in, you are presented with a dashboard providing a curated view of some of those services; a complete list is available from the “hamburger” menu accessible from the top-left corner. For this chapter, you will be focusing on GCE, whose dashboard tile is shown in Figure 7-1.