March 2018
Intermediate to advanced
484 pages
10h 31m
English
DL models have to be trained on a large amount of data to improve their performance. However, training a deep network with millions of parameters may take days, or even weeks. In Large Scale Distributed Deep Networks, Dean et al. proposed two paradigms, namely model parallelism and data parallelism, which allow us to train and serve a network model on multiple physical machines. In the following section, we introduce these paradigms with a focus on distributed TensorFlow capabilities.
Model parallelism gives every processor the same data but applies a different model to it. If the network model is too big to fit into one machine's memory, different parts of the model can be assigned to different machines. ...