Skip to Content
Statistics for Machine Learning
book

Statistics for Machine Learning

by Pratap Dangeti
July 2017
Beginner to intermediate
442 pages
10h 8m
English
Packt Publishing
Content preview from Statistics for Machine Learning

Adagrad

Adagrad is an algorithm for gradient-based optimization that adapts the differential learning rate to parameters, performing larger updates for infrequent parameters and smaller updates for frequent parameters.

Adagrad greatly improves the robustness of SGD and used it to train large-scale neural nets. One of Adagrad's main benefits is that it eliminates the need to manually tune the learning rate. Most implementations use a default value of 0.01 and leave it at that.

Adagrad's main weakness is its accumulation of the squared gradients in the denominator: since every added term is positive, the accumulated sum keeps growing during training. This, in turn, causes the learning rate to shrink and eventually become infinitesimally small, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Probability and Statistics for Machine Learning

Probability and Statistics for Machine Learning

Jon Krohn

Publisher Resources

ISBN: 9781788295758Supplemental Content