Distributed deep learning across multiple GPUs

As stated earlier, we will see a systematic example for classifying a large collection of video clips from the UCF101 dataset using a convolutional-LSTM network. However, first we need to know how to distribute the training across multiple GPUs. In previous chapters, we discussed several advanced techniques such as network weight initialization, batch normalization, faster optimizers, proper activation functions, etc. these certainly help the network to converge faster. However, still, training a large neural network on a single machine can take days or even weeks. Therefore, this is not a viable way for working with large-scale datasets.

Theoretically, there are two main methods for the distributed ...

Get Java Deep Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.