Skip to Main Content
Accelerate Model Training with PyTorch 2.X
book

Accelerate Model Training with PyTorch 2.X

by Maicon Melo Alves
April 2024
Intermediate to advanced content levelIntermediate to advanced
230 pages
5h 12m
English
Packt Publishing
Content preview from Accelerate Model Training with PyTorch 2.X

11

Training with Multiple Machines

We’ve finally arrived at the last mile of our performance improvement journey. In this last stage, we will broaden our horizons and learn how to distribute the training process across multiple machines or servers. So, instead of using four or eight devices, we can use dozens or hundreds of computing resources to train our models.

An environment comprised of multiple connected servers is usually called a computing cluster or simply a cluster. Such environments are shared among multiple users and have technical particularities such as a high bandwidth and low latency network.

In this chapter, we’ll describe the characteristics of computing clusters that are more relevant to the distributed training process. After ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

PyTorch Recipes: A Problem-Solution Approach to Build, Train and Deploy Neural Network Models

PyTorch Recipes: A Problem-Solution Approach to Build, Train and Deploy Neural Network Models

Pradeepta Mishra

Publisher Resources

ISBN: 9781805120100Supplemental Content