In Part I, we covered the essentials of putting together an efficient and resilient physical and organizational infrastructure for your clusters. Upon this solid foundation, we can now build comprehensive distributed software platforms that can cope with the rigors of large-scale data storage and processing inherent to the requirements and use cases of large enterprises.
In the following chapters, we explore the architectural aspects of modern data platforms, ranging from the basic operating system and supporting software to the provisioning of Hadoop and other distributed systems. Organizations require that these platforms fit into a preexisting ecosystem of users and applications, and enterprise standards demand that the deployments meet certain standards of security, availability, and disaster recovery. We cover these concerns in detail.
By the end of this section, our hope is that the reader—be they an architect, application developer, or cluster operator—will feel confident in how and, crucially, why clusters are put together. This understanding will be of immense value in building and operating new clusters and in designing and running applications that work in sympathy with distributed enterprise data platforms.