As we mentioned before, the optimization algorithm is sensitive to the choice of learning rate. It is important to summarize the effect of this choice in a concise manner:
Learning rate size |
Advantages/disadvantages |
Uses |
Smaller learning rate |
Converges slower but more accurate results |
If solution is unstable, try lowering the learning rate first |
Larger learning rate |
Less accurate, but converges faster |
For some problems, helps prevent solutions from stagnating |
Sometimes, the standard gradient descent algorithm can get stuck or slow down significantly. This can happen when the optimization is stuck in the flat spot of a saddle. To combat this, there is another algorithm that takes into account a ...