Chapter 5. Parallel Neural Network Training
If you’ve already worked a bit with deep learning, you’ve probably noticed that training takes a substantial amount of time. This is why running TensorFlow in multiple nodes on multiple CPUs or GPUs became a standard task in deep learning. In TensorFlow 2.x, this task has become so simple that everybody with the necessary hardware at hand can use it. And in the era of cloud computing, everyone with a credit card can do it. So let’s get started.
Although TensorFlow abstracts away all complex details for you, it still makes sense to understand how parallel neural network training and scoring is done. If you are familiar with the topic, you’ll know those complex details include splitting of training data, distributed computation of gradients, and gradient synchronization—but more on this in the next section.
Parallel Neural Network Training Explained
We need to distinguish between four types of parallelization.
The terms we’re using (with Google’s wording in parentheses) are:
Inter-model parallelism (parallel hyperparameter search)
Data parallelism (data parallelism)
Intra-model parallelism (model parallelism)
Pipelined parallelism (model parallelism)
Let us explain why we’re deviating from Google’s definitions. First, you can see that our definitions are a little more granular, since we have four terms where Google uses only three. Second, we believe that our terms have clearer semantics. We’ll explain this for each term in ...