Skip to Content
TensorFlow 2 Pocket Reference
book

TensorFlow 2 Pocket Reference

by KC Tung
July 2021
Intermediate to advanced
253 pages
5h 1m
English
O'Reilly Media, Inc.
Content preview from TensorFlow 2 Pocket Reference

Chapter 8. Distributed Training

Training a machine learning model may take a long time, especially if your training dataset is huge or you are using a single machine to do the training. Even if you have a GPU card at your disposal, it can still take weeks to train a complex model such as ResNet50, a computer vision model with 50 convolution layers, trained to classify objects into a thousand categories.

Reducing model training time requires a different approach. You already saw some of the options available: in Chapter 5, for example, you learned to leverage datasets in a data pipeline. Then there are more powerful accelerators, such as GPUs and TPUs (which are exclusively available in Google Cloud).

This chapter will cover a different way to train your model, known as distributed training. Distributed training runs a model training process in parallel on a cluster of devices, such as CPUs, GPUs, and TPUs, to speed up the training process. (In this chapter, for the sake of concision, I will refer to hardware accelerators such as GPUs, CPUs, and TPUs as workers or devices.) After you read this chapter, you will know how to refactor your single-node training routine for distributed training. (Every example you have seen in this book up to this point has been single node: that is, they have all used a machine with one CPU to train the model.)

In distributed training, your model is trained by multiple independent processes. You can think of each process as an independent training ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The TensorFlow Workshop

The TensorFlow Workshop

Matthew Moocarme, Abhranshu Bagchi, Anthony So, Anthony Maddalone

Publisher Resources

ISBN: 9781492089179Errata PageSupplemental Content