Chapter 11: Distributed Training

Before serving pre-trained machine learning models, which we discussed extensively in the previous chapter, we need to train our machine learning models. In Chapter 3, Deep CNN Architectures; Chapter 4, Deep Recurrent Model Architectures; and Chapter 5, Hybrid Advanced Models, we have seen the vast expanse of increasingly complex deep learning model architectures.

Such gigantic models often have millions and even billions of parameters. The recent (at the time of writing) Generative Pre-Trained Transformer 3 (GPT3) language model has 175 billion parameters. Using backpropagation to tune many parameters requires enormous amounts of memory and compute power. And even then, model training can take days to finish. ...

Get Mastering PyTorch now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.