In This Chapter
Examining the components that comprise a Hadoop cluster
Designing the Hadoop cluster components
Reviewing Hadoop deployment form factors
Sizing a Hadoop cluster
At its core, Hadoop is a system for storing and processing data at a massive scale using a cluster of many individual compute nodes. In this chapter, we describe the tasks involved in building a Hadoop cluster, all the way from the hardware components in the compute nodes to different cluster configuration patterns, to how to appropriately size clusters. In at least one way, Hadoop is no different from many other IT systems: If you don’t design your cluster to match your business requirements, you get bad results.
While you’re getting your feet wet with Hadoop, you’re likely to limit yourself to using a pseudo-distributed cluster running in a virtual machine on a personal computer. Though this environment is a good one for testing and learning, it’s obviously inappropriate for production-level performance and scalability. In this section, ...