July 2017
Beginner to intermediate
442 pages
10h 8m
English
Adam is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients, similar to momentum.
When you are in doubt, just use Adam!