Chapter 11: Distributed Training

Before serving pre-trained machine learning models, which we discussed extensively in the previous chapter, we need to train our machine learning models. In Chapter 3, Deep CNN Architectures; Chapter 4, Deep Recurrent Model Architectures; and Chapter 5, Hybrid Advanced Models, we have seen the vast expanse of increasingly complex deep learning model architectures.

Such gigantic models often have millions and even billions of parameters. The recent (at the time of writing) Generative Pre-Trained Transformer 3 (GPT3) language model has 175 billion parameters. Using backpropagation to tune many parameters requires enormous amounts of memory and compute power. And even then, model training can take days to finish. ...

Get Mastering PyTorch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.