Preface
Distributed computing is a fascinating topic. Looking back at the early days of computing, one can’t help but be impressed by the fact that so many companies today distribute their workloads across clusters of computers. It’s impressive that we have figured out efficient ways to do so, but scaling out is also becoming more and more of a necessity. Individual computers keep getting faster, and yet our need for large-scale computing keeps exceeding what single machines can do.
Recognizing that scaling is both a necessity and a challenge, Ray aims to make distributed computing simple for developers. It makes distributed computing accessible to nonexperts and makes it possible to scale your Python scripts across multiple nodes fairly easily. Ray is good at scaling both data- and compute-heavy workloads, such as data preprocessing and model training—and it explicitly targets machine learning (ML) workloads with the need to scale. While it is possible today to scale these two types of workloads without Ray, you would likely have to use different APIs and distributed systems for each. And managing several distributed systems can be messy and inefficient in many ways.
The addition of the Ray AI Runtime (AIR) with the release of Ray 2.0 in August 2022 increased the support for complex ML workloads in Ray even further. AIR is a collection of libraries and tools that make it easy to build and deploy end-to-end ML applications in a single distributed system. With AIR, even the most ...