8 Scaling out with distributed training

This chapter covers

Understanding distributed data parallel gradient descent
Using gradient accumulation in gradient descent for out-of-memory data sets
Evaluating parameter server versus ring-based approaches for distributed gradient descent
Understanding reduce-scatter and all-gather phases of ring-based gradient descent
Implementing a single node version of ring-based gradient descent using Python

In chapter 7, you learned about scaling up your machine learning implementation to make the most of the compute resources available in a single compute node. For example, you saw examples for taking advantage of the more powerful processors in GPU devices. However, as you will discover by launching a machine ...

Get MLOps Engineering at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

MLOps Engineering at Scale by Carl Osipov

8 Scaling out with distributed training

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly