Adam optimizer
In the hill example, you first walked with big strides down the hill using momentum (larger strides because you are going in the right direction). Then you had to take smaller steps to find the object. You are adapting your estimation of your moment to your need; hence, the name adaptive moment estimation (Adam).
Adam constantly compares the mean past gradients to present gradients. In the hill example, it compares how fast you were going.
The Adam optimizer represents an alternative to the classical gradient descent method or stochastic gradient descent method (see Chapter 5, Manage the Power of Machine Learning and Deep Learning). Adam goes further by applying its optimizer to random (stochastic) mini-batches of the dataset. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access