Adaptive moment estimation

Adaptive moment estimation, known as Adam for short, is one of the most popularly used algorithms for optimizing a neural network. While reading about RMSProp, we learned that we compute the running average of squared gradients to avoid the diminishing learning rate problem:

The final updated equation of RMSprop is given as follows:

Similar to this, in Adam, we also compute the running average of the squared gradients. However, along with computing the running average of the squared gradients, we also compute the ...

Get Hands-On Deep Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.