Chapter 4: Bottlenecks and Solutions

Using the code we designed in Chapter 3, Building a Data Parallel Training and Serving Pipeline, we can build data parallel training and serving pipelines using either the parameter server or the All-Reduce paradigm. Similar to what we did in Chapter 3, Building a Data Parallel Training and Serving Pipeline, in this chapter, we will focus on the more widely used All-Reduce paradigm.

In this chapter, we will discuss the shortcomings in the current data parallel training and serving pipelines. For practical system bottleneck discussions, we will make the following assumptions:

  • We use homogenous accelerators for all our model training nodes.
  • Compared to CPU memory (that is, main memory), the on-device memory ...

Get Distributed Machine Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.