Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
利用
Ray
进行数据处理
|
137
.repeat
支持多次迭代同一数据集(在本例中为
10
次)
每次重复时随机打乱数据。
我们想让每个
worker
都有自己的本地数据分片,因此将
DatasetPipeline
分为多个较小的数据集,以传递给每个
worker
等待所有
worker
完成训练。
为了在
worker
上训练,我们调用
train
方法,并将
DatasetPipeline
的分片
传递给每个节点。然后,我们阻塞等待所有
worker
完成训练。对该阶段进行
总结:
1.
每个回合,每个
worker
获取随机的数据分片。
2.
worker
在分配给它的数据分片上训练本地模型。
3.
一旦
worker
完成当前分片的训练,它会阻塞,直到其他
worker
完成训练。
4.
在剩余的回合中(在本例中共计
10
个回合),重复进行前面三个步骤。
最后,我们在测试数据上测试每个
worker
训练的模型,以确定哪个
alpha
值的
模型最精确:
# Get validation results from each worker.
print(ray.get([worker.test.remote(X_test, Y_test) for worker in workers]))
实际上,对于这种类型的任务,你应该使用
Ray Tune
Ray Train
(见第
7
章)
但这个示例表明了
Ray Dataset
在机器学习任务中的强大能力。只需几行
Python
代码,我们就实现了一个复杂的分布式超参数调优和训练工作流,并可以轻松
扩展到数百台机器,并且不限于任何框架或机器学习任务。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384