RMSprop
We can also think about optimization in a different way: what if we adjust the learning rate based on feature importance? We could decrease the learning rate when we are updating parameters on common features and then increase it when we are looking at more uncommon ones. This also means that we can spend less time optimizing the learning rate. There are several variations of this idea that have been proposed, but the most popular by far is called RMSprop.
RMSprop is a modified form of SGD that, while unpublished, is elaborated in Geoffrey Hinton's Neural Networks for Machine Learning. RMSprop sounds fancy, but it could just as easily be called adaptive gradient descent. The basic idea is you modify your learning rate based on certain ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access