March 2018
Intermediate to advanced
484 pages
10h 31m
English
Gradients for deeper layers are calculated as products of many gradients of activation functions in the multi-layer network. When those gradients are small or zero, it will easily vanish. On the other hand, when they are bigger than 1, it will possibly explode. So, it becomes very hard to calculate and update.
Let's explain them in more detail:
Read now
Unlock full access