December 2018
Beginner to intermediate
684 pages
21h 9m
English
A popular refinement of basic gradient descent adds momentum to accelerate the convergence to a local minimum.
While, in practice, the dimensionality would be much higher, the image of a local optimum at the center of an elongated ravine is often used. It implies a minimum inside a deep and narrow gorge or canyon with very steep walls that have a large gradient but a much gentler slope toward a local minimum at the bottom of this region. Gradient descent naturally follows the steep gradient and will make repeated adjustments up and downs the walls of the canyons with much slower movements toward the minimum.
Momentum aims to address such situation by tracking past directions and adjusting the parameters by a weighted average of the ...