February 2018
Intermediate to advanced
378 pages
10h 14m
English
The sigmoid asymptotically approaches zero on the one end, and the 1 on another end. On those tails, the derivative of a function is very small. This is bad news for the backpropagation algorithm because these almost-zero values are killing the signal when it propagates through the network back to update weights.
The problem with dead neurons: if you initialize network weights at random, sigmoidal neurons with large weights would be dead (almost not transmitting the signal) from the very beginning.
Read now
Unlock full access