June 2020
Intermediate to advanced
364 pages
13h 56m
English
As we have seen, if our weights are too small, then they vanish, which results in dead neurons and, conversely, if our weights are too big, we get exploding gradients. We want to avoid both scenarios, which means we need the weights to be initialized just right so that our network can learn what it needs to.
To tackle this problem, Xavier Glorot and Yoshua Bengio created a normalized initialization method (generally referred to as Xavier initialization). It is as follows:

Here, nk is the number of neurons in layer k.
But why does this work better than randomly initializing our network? The idea is that we want to maintain ...