book

Hands-On Mathematics for Deep Learning

by Jay Dawani

June 2020

Intermediate to advanced

364 pages

13h 56m

English

Packt Publishing

Read now

Unlock full access

Content preview from Hands-On Mathematics for Deep Learning

Gradient descent with momentum

As we have seen, gradient descent takes some time to find its way to a relatively flat surface. An improvement to the preceding example is gradient descent with momentum, which smoothes the gradient updates so that it is less erratic. Consider a tennis ball and a boulder both rolling down a mountain. The tennis ball would bounce around more and likely get stuck, but the boulder would gain momentum as it goes and maintain a relatively straight path toward the bottom. That is the key idea behind this improvement. It does so by remembering the previous updates and each update is a combination of the previous and current gradients, as follows:

Here, and .

In this method, as you will notice, we not only have to ...