In this chapter we discuss the use of TensorFlow for distributed computing. We start by briefly surveying the different approaches to distributing model training in machine learning in general, and specifically for deep learning. We then introduce the elements of TensorFlow designed to support distributed computing, and finally put everything together with an end-to-end example.
Distributed computing, in the most general terms, entails the utilization of more than one component in order to perform the desired computation or achieve a goal. In our case, this means using multiple machines in order to speed up the training of a deep learning model.
The basic idea behind this is that by using more computing power, we should be able to train the same model faster. This is indeed often the case, although just how much faster depends on many factors (i.e., if you expect to use 10× resources and get a 10× speedup, you are most likely going to be disappointed!).
There are many ways to distribute computations in a machine learning setting. You may want to utilize multiple devices, either on the same machine or across a cluster. When training a single model, you may want to compute gradients across a cluster to speed up training, either synchronously or asynchronously. A cluster may also be used to train multiple models at the same time, or in order to search for the optimal parameters for a single model.
In the following subsections ...