Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
194
|
9
resources
为每个组规范明确指定容器
CPU
以及内存请求和限制非常重要。对于
GPU
任务,如果使用
Nvidia GPU
设备插件,你还可以指定
GPU
限制,例如
nvidia.com/gpu: 1
nodeSelector
tolerations
你可以通过设置
Pod
规范的
nodeSelector
tolerations
字段来控制
worker
组的
Ray Pod
的调度。具体而言,这些字段确定
Pod
可以在哪些
Kubernetes
节点上调度。注意,
KubeRay
管理器在
Pod
级别上操作,与底层
Kubernetes
节点的设置无关。
Kubernetes
节点配置由
Kubernetes
集群管理员
处理。
Ray
容器镜像
指定集群的
Ray
容器所使用的镜像非常重要。集群的主节点和工作节点应
该使用相同的
Ray
版本。在大多数情况下,对于给定的
Ray
集群,使用完
全相同的容器镜像作为主节点和所有工作节点是有意义的。如果要为集群指
定自定义依赖项,应该基于官方镜像
rayproject/ray
来构建镜像。
卷挂载
卷挂载可用于保留源自
Ray
容器的日志或其他应用程序数据(参见
9.2.5
节)
容器环境变量
容器环境变量可用于修改
Ray
的行为。例如,
RAY_LOG_TO_STDERR
将日志重
定向到
STDERR
(标准错误),而不是将其写入容器的文件系统。
9.2.5
配置
KubeRay
日志
Ray
集群进程通常将日志写入
Pod
中的目录
/tmp/ray/session_latest/logs
。这些日
志也可以在
Ray
数据看板中查看。如果要将 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384