Skip to Content
Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
book

Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署

by Max Pumperla, Edward Oakes, Richard Liaw
May 2024
Intermediate
252 pages
5h 31m
Chinese
China Machine Press
Content preview from Ray 分布式机器学习:利用Ray 进行大模型的数据处理、训练、推理和部署
126
|
6
学习训练和推理任务。
本章将介绍利用
Ray
进行数据处理的核心概念,并介绍如何使用不同的组件完
成常见任务。我们假设你对数据处理(如
map
filter
groupby
partition
有基本的了解,但不打算深入数据科学或这些操作的内部实现方式。对数据科
学了解不多的读者不会遇到问题。
接下来,我们就开始介绍核心模块:
Ray Dataset
。其中包括架构、
API
基础知
识,以及
Ray Dataset
如何构建复杂的数据密集型应用程序。然后,我们简要介
Ray
的外部集成库,重点介绍
Dask on Ray
。最后,我们通过在单个
Python
脚本中构建可扩展的端到端机器学习管道,将所有内容串联起来。
本章的代码笔记可以在
GitHub
上找到(
https://oreil.ly/CjHSJ
,同时
还提供了端到端示例的数据(
https://oreil.ly/5Ga8-
)。
6.1 Ray Dataset
Ray Dataset
的主要目标是在
Ray
上支持可扩展、灵活的数据处理抽象。
Datasets
旨在成为
Ray
生态中读取、写入和传输数据的标准方式。
Ray Dataset
最强大的
用途之一是作为机器学习任务的数据导入和预处理层,方便用户利用
Ray Train
Ray Tune
高效扩展训练。
6.3
节将详细探讨这一点。
如果你之前使用过其他分布式数据处理
API
,比如
Apache Spark
的弹性分布式
数据集(
Resilient Distributed Dataset
,那么对
Ray Dataset
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

通过可观测性确保数据与AI的可靠性

通过可观测性确保数据与AI的可靠性

Barr Moses, Michael Segner

Publisher Resources

ISBN: 9787111753384