Chapter 6: Pipeline Input and Layer Split

In this chapter, we will continue our discussion about model parallelism. Compared to data parallelism, model parallelism training often takes more GPUs/accelerators. Thus, system efficiency plays an important role during model parallelism training and inference.

We limit our discussion with the following assumptions:

  • We assume the input data batches are the same size.
  • In multi-layer perceptrons (MLPs), we assume they can be calculated with general matrix multiply (GEMM) functions.
  • For each NLP job, we run it exclusively over a set of accelerators (for example, GPUs). This means there is no interference from other jobs.
  • For each NLP job, we use the same type of accelerator (for example, GPUs).
  • GPUs ...

Get Distributed Machine Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.