The defining characteristics of big data—volume, variety, and velocity—don’t just apply to the information stored within a modern data platform; they also apply to the knowledge required to build and use one effectively.
The topics touched upon are varied and deep, ranging from hardware selection and datacenter management through to statistics and machine learning. Even from just a platform architecture perspective, which is the scope of this book, the body of knowledge required is considerable. With such a wide selection of topics to cover, we have decided to present the material in parts.
In this first part, our intention is to equip the reader with foundational knowledge and understanding relating to infrastructure, both physical and organizational. Some chapters will be a deep dive into subjects such as compute and storage technologies, while others provide a high-level overview of subjects such as datacenter considerations and organizational challenges.