book

Learning Ray

by Max Pumperla, Edward Oakes, Richard Liaw

February 2023

Beginner

271 pages

7h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Who Should Read This BookGoals of This BookNavigating This BookHow to Use the Code ExamplesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. An Overview of Ray
What Is Ray?What Led to Ray?Ray’s Design PrinciplesThree Layers: Core, Libraries, and EcosystemA Distributed Computing FrameworkA Suite of Data Science LibrariesRay AIR and the Data Science WorkflowData Processing with Ray DatasetsModel TrainingHyperparameter TuningModel ServingA Growing EcosystemSummary
2. Getting Started with Ray Core
An Introduction to Ray CoreA First Example Using the Ray APIAn Overview of the Ray Core APIUnderstanding Ray System ComponentsScheduling and Executing Work on a NodeThe Head NodeDistributed Scheduling and ExecutionA Simple MapReduce Example with RayMapping and Shuffling Document DataReducing Word CountsSummary
3. Building Your First Distributed Application
Introducing Reinforcement LearningSetting Up a Simple Maze ProblemBuilding a SimulationTraining a Reinforcement Learning ModelBuilding a Distributed Ray AppRecapping RL TerminologySummary
4. Reinforcement Learning with Ray RLlib
An Overview of RLlibGetting Started with RLlibBuilding a Gym EnvironmentRunning the RLlib CLIUsing the RLlib Python APIConfiguring RLlib ExperimentsResource ConfigurationRollout Worker ConfigurationEnvironment ConfigurationWorking with RLlib EnvironmentsAn Overview of RLlib EnvironmentsWorking with Multiple AgentsWorking with Policy Servers and ClientsAdvanced ConceptsBuilding an Advanced EnvironmentApplying Curriculum LearningWorking with Offline DataOther Advanced TopicsSummary
5. Hyperparameter Optimization with Ray Tune
Tuning HyperparametersBuilding a Random Search Example with RayWhy Is HPO Hard?An Introduction to TuneHow Does Tune Work?Configuring and Running TuneMachine Learning with TuneUsing RLlib with TuneTuning Keras ModelsSummary
6. Data Processing with Ray
Ray DatasetsRay Datasets BasicsComputing Over Ray DatasetsDataset PipelinesExample: Training Copies of a Classifier in ParallelExternal Library IntegrationsBuilding an ML PipelineSummary
7. Distributed Training with Ray Train
The Basics of Distributed Model TrainingIntroduction to Ray Train by ExamplePredicting Big Tips in NYC Taxi RidesLoading, Preprocessing, and FeaturizationDefining a Deep Learning ModelDistributed Training with Ray TrainDistributed Batch InferenceMore on Trainers in Ray TrainMigrating to Ray Train with Minimal Code ChangesScaling Out TrainersPreprocessing with Ray TrainIntegrating Trainers with Ray TuneUsing Callbacks to Monitor TrainingSummary
8. Online Inference with Ray Serve
Key Characteristics of Online InferenceML Models Are Compute IntensiveML Models Aren’t Useful in IsolationAn Introduction to Ray ServeArchitectural OverviewDefining a Basic HTTP EndpointScaling and Resource AllocationRequest BatchingMultimodel Inference GraphsEnd-to-End Example: Building an NLP-Powered APIFetching Content and PreprocessingNLP ModelsHTTP Handling and Driver LogicPutting It All TogetherSummary

9. Ray Clusters
Manually Creating a Ray ClusterDeployment on KubernetesSetting Up Your First KubeRay ClusterInteracting with the KubeRay ClusterExposing KubeRayConfiguring KubeRayConfiguring Logging for KubeRayUsing the Ray Cluster LauncherConfiguring Your Ray ClusterUsing the Cluster Launcher CLIInteracting with a Ray ClusterWorking with Cloud ClustersAWSUsing Other Cloud ProvidersAutoscalingSummary
10. Getting Started with the Ray AI Runtime
Why Use AIR?Key AIR Concepts by ExampleRay Datasets and PreprocessorsTrainersTuners and CheckpointsBatch PredictorsDeploymentsWorkloads That Are Suited for AIRAIR Workload ExecutionAIR Memory ManagementAIR Failure ModelAutoscaling AIR WorkloadsSummary
11. Ray’s Ecosystem and Beyond
A Growing EcosystemData Loading and ProcessingModel TrainingModel ServingBuilding Custom IntegrationsAn Overview of Ray’s IntegrationsRay and Other SystemsDistributed Python FrameworksRay AIR and the Broader ML EcosystemHow to Integrate AIR into Your ML PlatformWhere to Go from Here?Summary
Index
About the Authors

Content preview from Learning Ray

Chapter 7. Distributed Training with Ray Train

Edward Oakes & Richard Liaw

In Chapter 6 we discussed how to train copies of a simple model on shards of data using Ray Datasets—but there’s much more to distributed training than that. As we indicated in Chapter 1, Ray has a dedicated library for distributed training called Ray Train. It comes with an extensive suite of machine learning training integrations and allows you to scale your experiments seamlessly on Ray Clusters.

We will start this chapter by showing you why you might need to scale your ML training and then introduce you to the different ways of doing so. After that, we’ll introduce Ray Train and walk through an extensive end-to-end example. We’ll also cover some key concepts you need to know to use Ray Train, such as preprocessors, trainers, and checkpoints. Finally, we’ll cover some of the more advanced functionality that Ray Train provides. As always, you can use the notebook for this chapter to follow along.

The Basics of Distributed Model Training

Machine learning often requires a lot of heavy computation. Depending on the type of model that you’re training, whether it be a gradient boosted tree or a neural network, you may face some common problems with training ML models:

The time it takes to finish training is too long.
The data is too large to fit into one machine.
The model itself is too large to fit into a single machine.

For the first case, training can be accelerated by processing data with increased ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098117214Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Learning Ray

by Max Pumperla, Edward Oakes, Richard Liaw

Chapter 7. Distributed Training with Ray Train

The Basics of Distributed Model Training

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.