Batch normalization
One of the popular techniques for preventing overfitting is batch normalization, which normalizes layers and allows us to train the normalization weights. During training, the distribution of each layer's inputs changes as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization (remember the discussion in the weight initialization part in this chapter). Batch normalization tackles this problem (so-called internal covariate shift) by normalizing the input for every mini-batch. This allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access