August 2019
Intermediate to advanced
242 pages
5h 45m
English
In Nesterov momentum, we are changing where/when we compute the gradient. We make a big jump in the direction of the previously accumulated gradient. Then, we measure the gradient at this new position and make a correction/update accordingly.
This correction prevents the ordinary momentum algorithm from updating too quickly, hence producing fewer oscillations as the gradient descent tries to converge.
Read now
Unlock full access