O'Reilly logo

Architecting Modern Data Platforms by Lars George, Paul Wilkinson, Ian Buss, Jan Kunigk

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Part II. Platform

In Part I, we covered the essentials of putting together an efficient and resilient physical and organizational infrastructure for your clusters. Upon this solid foundation, we can now build comprehensive distributed software platforms that can cope with the rigors of large-scale data storage and processing inherent to the requirements and use cases of large enterprises.

In the following chapters, we explore the architectural aspects of modern data platforms, ranging from the basic operating system and supporting software to the provisioning of Hadoop and other distributed systems. Organizations require that these platforms fit into a preexisting ecosystem of users and applications, and enterprise standards demand that the deployments meet certain standards of security, availability, and disaster recovery. We cover these concerns in detail.

By the end of this section, our hope is that the reader—be they an architect, application developer, or cluster operator—will feel confident in how and, crucially, why clusters are put together. This understanding will be of immense value in building and operating new clusters and in designing and running applications that work in sympathy with distributed enterprise data platforms.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required