Chapter 6. Datacenter Considerations

Hadoop architects and engineers are not primarily concerned with the datacenter, which offers rather mechanical and commoditized layers of enterprise IT. However, some key features of Hadoop only work as advertised if they are met by the correct layout of datacenter technology. You need to be aware of these effects when placing Hadoop into an existing enterprise IT environment, which has typically been optimized for virtualized host environments and remote storage solutions over the course of the last 10 years.

The content in this chapter, although not an exhaustive discussion on datacenters, may well be of crucial importance for certain key architectural decisions related to reliability and disaster tolerance.

We initially focus on some basic datacenter infrastructure concepts before we revisit some of the ways in which Hadoop differs from other commodity infrastructure setups. We provide a section that addresses common issues with data ingest in the context of datacenters, and finally we highlight common pitfalls that emerge around topics like multidatacenter disaster tolerance.

If you run Hadoop on a public cloud service, much of this chapter is not relevant to your situation, but “Quorum spanning with three datacenters” specifically covers an important subject around cluster spanning that you should observe.

Why Does It Matter ?

Intuition tells us that the distributed nature of Hadoop is likely to have ramifications for the datacenter. ...

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.