18
Optimizing Neural Networks
In this chapter, we're going to discuss the most important optimization algorithms that have been derived from the basic Stochastic Gradient Descent (SGD) approach. This method can be quite ineffective when working with very high-dimensional functions, forcing the models to remain stuck in sub-optimal solutions. The optimizers discussed in this chapter have the goals of speeding up convergence and avoiding any sub-optimality. Moreover, we'll also discuss how to apply L1 and L2 regularization to a layer of a deep neural network, and how to avoid overfitting using these advanced approaches.
In particular, the topics covered in the chapter are as follows:
- Optimized SGD algorithms (Momentum, RMSProp, Adam, AdaGrad, ...
Get Mastering Machine Learning Algorithms - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.