November 2017
Intermediate to advanced
274 pages
6h 16m
English
We will now be looking at various methods for optimizing gradient descent in order to calculate different learning rates for each parameter, calculate momentum, and prevent decaying learning rates.
To solve the problem of high variance oscillation of the SGD, a method called momentum was discovered; this accelerates the SGD by navigating along the appropriate direction and softening the oscillations in irrelevant directions. Basically, it adds a fraction of the update vector of the past step to the current update vector. Momentum value is usually set to .9. Momentum leads to a faster and stable convergence with reduced oscillations.
Nesterov accelerated gradient explains that as we reach the minima, ...
Read now
Unlock full access