4 Distributed training

This chapter covers

  • Understanding data parallelism, model parallelism, and pipeline parallelism
  • Using a sample training service that supports data parallel training in Kubernetes
  • Training large models with multiple GPUs

One obvious trend in the deep learning research field is to improve model performance with larger datasets and bigger models with increasingly more complex architecture. But more data and bulkier models have consequences: they slow down the model training process as well as the model development process. As is often the case in computing, performance is pitted against speed. For example, it can cost several months to train a BERT (Bidirectional Encoder Representations from Transformers) natural language ...

Get Designing Deep Learning Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.