August 2018
Intermediate to advanced
272 pages
7h 2m
English
In their paper, Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot and Yoshua Bengio showed that if the weights at each layer are initialized from a uniform distribution
, where
is the size of the previous layer, then for sigmoid activation function, the neurons of the top layers (closer to the output) quickly saturate to 0. We understand that due to the form of the sigmoid function, an activation value of 0 means very large weights and a backpropagated gradient approaching ...
Read now
Unlock full access