Chapter 3. Compute and Storage

Although it is perfectly possible to use a machine without understanding its inner workings, for best results, a practitioner should understand at least a little of how it functions. Only then can the user operate the machine with what Jackie Stewart called “mechanical sympathy”. This principle is particularly true when the machine is actually composed of many smaller machines acting in concert and each smaller machine is composed of many subcomponents. This is exactly what a Hadoop cluster is: a set of distributed software libraries designed to run on many servers, where each server is made up of an array of components for computation, storage, and communication. In this chapter, we take a brief look at how these smaller machines function.

We begin with important details of computer architecture and the Linux operating system. We then talk about different server sizes, before finally covering how these different server types can be combined into standard cluster configurations.

You probably won’t need to memorize all the details in this chapter, but understanding the fundamentals of how Hadoop interacts with the machines on which it runs is invaluable in your mission to build and operate rock-solid use cases on Hadoop clusters. Refer to this chapter when you need to guide decisions on component selection with infrastructure teams, procurement departments, or even your manager.

During the initial years of its enterprise adoption, the recommendation ...

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.