Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
204
|
10
10.2.1 Ray Dataset
和预处理器
Ray AIR
中加载数据的标准方式是使用
Ray Dataset
AIR
预处理器用于将输
入数据转换为机器学习实验的特征。我们在第
7
章中已经简要介绍了预处理器,
但尚未在
AIR
的上下文中讨论过。
Ray AIR
预处理器是基于
Dataset
Ray
生态的,能够有效地扩展预处理步骤。
在训练期间,
AIR
预处理器拟合指定的训练数据,然后用于训练和部署
1
AIR
附带了许多常见的预处理器,涵盖了许多用例。如果找不到所需的预处理器,
你还可以很容易地自定义预处理器。
1
在示例中,我们首先使用
read_csv
方法,从
S3
存储桶中读取
CSV
文件,并
将其转换为列式数据集。然后,我们将数据集分割为训练数据集和测试数据集,
并定义
AIR
预处理器
StandardScaler
,该预处理器将数据集的所有指定列进
行归一化,均值为
0
,方差为
1
。注意,仅指定一个预处理器还不会立即转换数
据。以下是实现方法:
import ray
from ray.data.preprocessors import StandardScaler
dataset = ray.data.read_csv(
"s3://anonymous@air-example-data/breast_cancer.csv"
)
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.2)
test_dataset = valid_dataset.drop_columns(cols=["target"]) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384