Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
Ray AIR
入门
|
217
AIR
中的使用。总结一下,
Ray
使用了一套智能方法确保数据和计算得到正
确分布和调度。
当使用
Ray Dataset
加载数据时,你已经知道内部会将这些数据集分区为集群中
的数据块。数据块是一组
Ray
对象。选择合适的数据块大小很关键,管理过多
小数据块开销会很高,数据块太大又会导致内存不足(
Out-Of-Memory
OOM
异常,需要对两者进行权衡。
AIR
采用了务实的方法,尽量使数据块不超过
512MB
。如果无法保证这一点,则发出警告。如果数据块无法适应内存,
AIR
将把数据溢出到本地磁盘。
有状态任务会不同程度地使用
Ray
对象存储。例如,
RLlib
使用
Ray
对象将模
型权重广播给各个
rollout worker
并收集经验数据。
Tune
使用
Ray
对象通过发
送和检索
AIR
检查点来设置
Trial
。出于技术原因,如果分配的资源需要太多的
内存,执行器可能会遇到
OOM
问题
9
。如果事先知道内存需求,可以相应地
ScalingConfig
中调整内存,或者直接请求额外的
CPU
资源。
1
在复合任务中,有状态执行器(例如,用于训练的执行器)必须访问由无状态
任务(例如,预处理任务)创建的数据,这使得内存分配更具挑战性。让我们
看看两种情况:
如果用于训练的执行器在对象存储中有足够的空间将所有训练数据放入内
存,情况就很简单。首先运行预处理步骤,之后将所有数据块下载到各个节
点,之后训练执行器迭代保存在内存中的数据。
否则,数据处理需要管道执行,这意味着数据将由任务即时处理,并在训练
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384