4 Distributed training

This chapter covers

Understanding data parallelism, model parallelism, and pipeline parallelism
Using a sample training service that supports data parallel training in Kubernetes
Training large models with multiple GPUs

One obvious trend in the deep learning research field is to improve model performance with larger datasets and bigger models with increasingly more complex architecture. But more data and bulkier models have consequences: they slow down the model training process as well as the model development process. As is often the case in computing, performance is pitted against speed. For example, it can cost several months to train a BERT (Bidirectional Encoder Representations from Transformers) natural language ...

Get Designing Deep Learning Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Designing Deep Learning Systems by Chi Wang, Kit Pang Szeto

4 Distributed training

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly