Chapter 14. Running Machine Learning in Kubernetes

The age of microservices, distributed systems, and the cloud has provided the perfect environmental conditions for the democratization of machine learning models and tooling. Infrastructure at scale has now become commoditized, and the tooling around the machine learning ecosystem is maturing. Kubernetes is one of the platforms that has become increasingly popular among developers, data scientists, and the wider open source community as the perfect environment to enable the machine learning workflow and life cycle. Large machine learning models like GPT-4 and DALL·E have brought machine learning into the spotlight and organizations like OpenAI have been very public about their use of Kubernetes to support these models. In this chapter, we will cover why Kubernetes is a great platform for machine learning and provide best practices for both cluster administrators and data scientists alike on how to get the most out of Kubernetes when running machine learning workloads. Specifically, we focus on deep learning rather than traditional machine learning because deep learning has quickly become the area of innovation on platforms like Kubernetes.

Why Is Kubernetes Great for Machine Learning?

Kubernetes has quickly become the home for rapid innovation in deep learning. The confluence of tooling and libraries such as TensorFlow makes this technology more accessible to a large audience of data scientists. What makes Kubernetes such a great ...

Get Kubernetes Best Practices, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.