February 2018
Intermediate to advanced
378 pages
10h 14m
English
Stochastic gradient descent (SGD) is an effective way of training deep neural networks. SGD seeks such parameters Θ of the network, which minimize the loss function ℒ.
Where
is a training dataset.
Training happens in steps. At every step, we choose a subset of our training set of size m (mini-batch) and use it to approximate loss function gradient with respect to parameters Θ:
Mini-batch training advantages are as follows: ...
Read now
Unlock full access