Chapter 7. Provisioning Clusters

This chapter discusses the provisioning and configuration of Hadoop cluster nodes. If you are using a cloud environment, then Part III is the more suitable section to read, as far as provisioning is concerned. In any event, the vast majority of Hadoop nodes run on Linux, so the operating system (OS)–related topics in this chapter still apply.

Operating Systems

The first task after acquiring physical hardware in the form of rack-mountable servers (for example, a 19” rack server or blades) is to provision the OS. There are many options, some dating back decades, which allow you to automate that process considerably. Separate technologies are often used for each step of the process:

Server bootstrap

The initial phase of a machine provisioning process is to automatically assign it an IP address and install the OS bootstrap executable. The most common technology used for this is called Preboot Execution Environment (PXE), which was introduced as part of the larger open industry standard Wired for Management (WfM). The latter also included the familiar Wake-on-LAN (WoL) standard. WfM was replaced by the Intelligent Platform Management Interface (IMPI) in 1998.

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.