June 2020
Intermediate to advanced
364 pages
13h 56m
English
As we have seen, gradient descent takes some time to find its way to a relatively flat surface. An improvement to the preceding example is gradient descent with momentum, which smoothes the gradient updates so that it is less erratic. Consider a tennis ball and a boulder both rolling down a mountain. The tennis ball would bounce around more and likely get stuck, but the boulder would gain momentum as it goes and maintain a relatively straight path toward the bottom. That is the key idea behind this improvement. It does so by remembering the previous updates and each update is a combination of the previous and current gradients, as follows:
Here, and .
In this method, as you will notice, we not only have to ...