July 2019
Intermediate to advanced
512 pages
19h 39m
English
Adaptive moment estimation, known as Adam for short, is one of the most popularly used algorithms for optimizing a neural network. While reading about RMSProp, we learned that we compute the running average of squared gradients to avoid the diminishing learning rate problem:

The final updated equation of RMSprop is given as follows:

Similar to this, in Adam, we also compute the running average of the squared gradients. However, along with computing the running average of the squared gradients, we also compute the ...
Read now
Unlock full access