May 2020
Beginner to intermediate
430 pages
10h 39m
English
During training, the distribution of each layer's input changes as the weight factor of the previous layer changes, which causes the training to slow down. This is because it requires a lower learning rate and weight factor selection. Sergey Ioffe and Christian Szegedy called this phenomenon internal covariance shift in their paper titled Batch Normalization: Accelerating Deep Network Training by Reducing Internal Co-variance Shift. For details, refer to: https://arxiv.org/abs/1502.03167.
Batch normalization addresses the issue of the covariance shift by subtracting the previous layer's batch mean from the current input and dividing it with batch standard deviation. This new input is then ...