Skip to Content
Kubernetes 上的生成式人工智能 (Chinese Edition)
book

Kubernetes 上的生成式人工智能 (Chinese Edition)

by Roland Huß, Daniele Zonca
February 2026
Intermediate to advanced
406 pages
4h 57m
Chinese
O'Reilly Media, Inc.
Content preview from Kubernetes 上的生成式人工智能 (Chinese Edition)

第1章 模型部署

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

当真实数据因隐私法规或合规要求无法离开自有集群,或需要更精细地控制模型部署与性能时,在自有集群内运行模型便成为必要选择。

市面上有许多不同的模型; 其中许多是开源的,可免费用于商业用途。Hugging Face 是最大的社区,您不仅可以在这里找到模型,还可以找到数据集和库。有关当前开源大型语言模型的列表,请参阅第 2 章

无论模型来源是否开源,在Kubernetes上部署时存在若干与模型本身无关的通用环节。但某些环节需深入分析模型特性以确定最佳方案。

本章将介绍运行时管理模型生命周期的不同方法与模式,重点聚焦于LLMs最常用的运行环境。 在深入部署细节前,请参阅侧边栏了解支撑现代LLMs的Transformer架构背景。

"我的机器上运行正常"

在探讨如何将模型部署到Kubernetes集群之前, 让我们先了解如何在本地机器上运行模型。

简而言之,部署模型需要同时具备模型本体和能够加载执行它的运行环境。 如前所述,基于Transformer的LLMs 是最常见的大型语言模型。 因此,您可以使用Hugging Face的Transformers库加载模型并调用它。 但这并不意味着每台笔记本电脑 都能处理类似工作负载,也无法加载任意规模的模型。 部分模型可通过CPU执行,但性能极为有限(生成完整句子需数十秒)。 实际上GPU是必需的。 此外,内存需求 与模型规模直接相关。 70亿参数模型(简称7B) 被视为小型语言模型(SLM),加载需配备约15GB内存的GPU。 而700亿参数模型则需要约140GB内存。

代码示例1-1展示了该方法的实现过程。

示例 1-1. 使用 Transformers加载并 执行 Llama 3 1B 模型
import transformers
import torch
import os

model_id = "meta-llama/Llama-3.2-1B-Instruct"   1

pipeline ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

工程领导力:难点 (Chinese Edition)

工程领导力:难点 (Chinese Edition)

Juan Pablo Buriticá, James Turnbull

Publisher Resources

ISBN: 0642572344672