Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
104
|
4
“离线”,是因为数据不是通过策略与环境在线交互产生的。不依赖自身策略输
出进行训练的算法,称为异策算法(
off-policy algorithm
),
Q
学习中的
DQN
属于该算法。与之相对应的算法,即依赖自身策略输出进行训练的算法,称为
同策算法(
on-policy algorithm
。换言之,异策算法可用于在离线数据上进行
训练
22
1
为了将存储在
temp
文件夹中的数据用于后续训练,我们创建一个新的
DQNConfig
,它以
temp
文件夹作为输入。我们还会将
explore
设置为
False
因为我们只想利用先前收集的数据进行训练,算法不会根据自己的策略进行
探索。
使用的
RLlib
算法与之前完全相同,我们对其进行
10
次迭代训练,然后进行
评估:
imitation_algo = (
DQNConfig()
.environment(env=AdvancedEnv)
.evaluation(off_policy_estimation_methods={})
.offline_data(input_=temp)
.exploration(explore=False)
.build())
for i in range(10):
imitation_algo.train()
imitation_algo.evaluate()
注意,我们将算法称为
imitation_algo
。这是因为此训练过程旨在模仿先前收
集数据中的行为。因此,这种在强化学习中通过示范进行学习的方法通常称为
模仿学习或行为克隆。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384