8 Scaling out with distributed training

This chapter covers

  • Understanding distributed data parallel gradient descent
  • Using gradient accumulation in gradient descent for out-of-memory data sets
  • Evaluating parameter server versus ring-based approaches for distributed gradient descent
  • Understanding reduce-scatter and all-gather phases of ring-based gradient descent
  • Implementing a single node version of ring-based gradient descent using Python

In chapter 7, you learned about scaling up your machine learning implementation to make the most of the compute resources available in a single compute node. For example, you saw examples for taking advantage of the more powerful processors in GPU devices. However, as you will discover by launching a machine ...

Get MLOps Engineering at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.