An essential step in building a deep learning model is solving the underlying optimization problem, as defined by the loss function. This chapter covers Stochastic Gradient Descent (SGD) , which is the most commonly used algorithm for solving such optimization problems. We also cover a breadth of algorithmic variations published in academic literature which improve on the performance of SGD, and present a bag of largely undocumented tricks which will allow the user to go an extra mile. Lastly, we cover some ground on parallel/distributed SGD and touch on Second Order ...
© Nikhil Ketkar 2017
Nikhil Ketkar, Deep Learning with Python, https://doi.org/10.1007/978-1-4842-2766-4_8
8. Stochastic Gradient Descent
Nikhil Ketkar1
(1)Bangalore, Karnataka, India
Get Deep Learning with Python: A Hands-on Introduction now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.