AI and LLM Deployment with Kubernetes

Beginner

Get started building an infrastructure for hosting GenAI on Kubernetes

What you’ll learn and how you can apply it

Learn to set up and manage a Kubernetes infrastructure tailored for hosting AI applications
Discover the best practices for configuring Kubernetes resources to enhance the performance of AI applications
Gain practical skills by running an Ollama-based AI application on Kubernetes, using real-world scenarios

Course description

Companies today are increasingly reliant on LLMs for internal inference and chat applications. To effectively deploy these AI-driven solutions, a flexible and scalable infrastructure is essential, and Kubernetes is the premier choice for this. This two-day course is designed to equip you with the skills you need to host and manage AI applications using Kubernetes.

Through hands-on practice and real-world scenarios, Kubernetes expert and trainer Sander van Vugt takes you through the requirements for setting up a Kubernetes infrastructure to host AI, and the necessary Kubernetes components. You’ll also practice running a simple AI application based on Ollama in Kubernetes, using all the Kubernetes resources typically seen in an AI infrastructure and positioning them to effectively support internal GenAI initiatives powered by LLMs.

This live event is for you because...

You’re looking for a scalable platform for running AI applications.
You want to integrate GPUs in Kubernetes.
You want to learn how to run an AI application on top of Kubernetes.
You’re a DevOps, data, or AI engineer, or an infrastructure architect who’s working with LLMs.

Prerequisites

A basic understanding of Kubernetes
If you want to follow along with the GPU-based parts of this course, you should have at least one virtual or physical server that has an NVIDIA GPU. Cloud instances with GPU resources offered by common cloud platforms are supported. Apart from that, two non-GPU-based servers are required.
If you want to follow along with building the Kubernetes cluster, you should have at least three Ubuntu Server LTS-based virtual machines.
The virtual machines should meet the following minimal requirements: 2 CPUs, 4 GB RAM, 40 GB disk space. Optional: One or more virtual machines with a GPU.

Recommended Preparation

Take Kubernetes in 4 Hours (live online course with Sander van Vugt)
Explore Getting Started with Kubernetes, third edition (on-demand course)

Recommended follow-up:

Take Create an On-Premises AI Solution (live online course with Sander van Vugt)
Take Deploying Scalable AI Workloads on Kubernetes (live online course with Vinit Jain)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Day 1

Requirements for setting up Kubernetes to service LLMs (60 minutes)

Presentation: Base Kubernetes node requirements; offering GPU access; considerations for on-premises or public cloud-based Kubernetes; exploring setup options for servicing LLMs on premises or in public cloud
Q&A
Break

Building a Kubernetes cluster to service LLMs (70 minutes)

Presentation: Installing GPU drivers; configuring the container runtime for GPU usage; building the base Kubernetes cluster; installing the GPU operator
Hands-on exercise: Set up the base Kubernetes cluster
Q&A
Break

Understanding resources for servicing LLMs (60 minutes)

Presentation: Analyzing an application that services LLMs; running applications in Pods, Deployments, and DaemonSets; providing access to applications with Services, Ingress, and Gateway API; providing access to storage with PV, PVC, and StorageClass
Q&A
Break

Configuring the GPU operator (50 minutes)

Presentation: Exploring GPU operator options and functionality; monitoring GPU operator components; configuring the GPU operator for using timeslices
Hands-on exercise: Explore using GPU timeslices
Q&A

Day 2

Presenting scalable storage to the application (70 minutes)

Presentation: Using Pod volumes or persistent volumes; setting up the application for using Pod volumes; setting up the application for using persistent volumes
Hands-on exercise: Provide persistent storage to the LLM-based application
Q&A
Break

Running inference workloads on Kubernetes (70 minutes)

Presentation: Understanding what is needed; fetching the LLM using Jobs or initPods; running a Deployment with the vLLM inference server; using resources and NodeSelectors; troubleshooting
Hands-on exercise: Run vLLM on Kubernetes
Q&A
Break

Providing access to the application (70 minutes)

Presentation: Setting up the service resource; configuring Gateway API
Hands-on exercise: Configure all that is needed to initiate a chat session with the application
Q&A
Break

AI workload scalability (10 minutes)

Presentation: Why Horizontal Pod Autoscaler doesn’t work; understanding inference job scalability requirements; planning for efficient resource usage

Kubeflow quick overview (10 minutes)

Presentation: Understanding Kubeflow use cases; Kubeflow component overview

Wrap-up and Q&A (10 minutes)

Your Instructor

Sander van Vugt
linkedin link search

Skills covered

Kubernetes

.NET
Knative
Serverless Architecture

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills