February 2018
Intermediate to advanced
378 pages
10h 14m
English
Small changes in the layer parameters affects all the following layer inputs, and the effect gets amplified with each next layer. This is especially problematic for the deep networks. The distribution of inputs to each layer changes during training, because parameters of the previous layer are being adjusted. This problem is known as internal covariate shift. Batch normalization technique was proposed in 2015 by Sergey Ioffe and Christian Szegedy from Google [1] to fix the problem. It allows normalizing layer inputs for each mini-batch as part of the network architecture. Batch normalization layer is usually inserted between the dot product and nonlinearity.
The benefits are as follows:
Read now
Unlock full access