December 2018
Beginner to intermediate
684 pages
21h 9m
English
The choice of appropriate learning rates is very challenging as highlighted in the preceding subsection on SGD. At the same time, it is one of the most important parameters and strongly impacts training time and generalization performance. While momentum addresses some of the issues with learning rates, it does so at the expense of introducing the momentum rate, another hyperparameter.
Several algorithms aim to adapt the learning rate throughout the training process based on gradient information (see references on GitHb for more detail).