Momentum
When thinking about optimization of gradient descent, we can certainly use intuition from real life to help to inform our methods. One example of this is momentum. If we imagine that most error gradients are really like a bowl, with the desired point in the middle, if we start from the highest point of the bowl, it could take us a long time to get to the bottom of the bowl.
If we think about some real-life physics, the steeper the side of the bowl, the quicker a ball would fall along the side as it gained momentum. Taking this as inspiration, we get what we can consider the momentum variation of SGD; we try to help to accelerate the descent down the gradient by considering that, if the gradient continues to go down the same direction, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access