Learning rates
Because stochastic gradient descent randomly samples, our location along the surface of our loss function jumps all over the place! We can mitigate this behavior by decreasing a parameter known as our learning rate. The learning rate is something called a hyperparameter, a parameter that controls the training process of the network. Learning rates control how much we are adjusting the weights of our network, with respect to the gradient of the loss function. In other words, it determines how quickly or slowly we descend while trying to reach our global minimum. The lower the value, the slower we descend downhill, just like on the right in the diagram as described as follows. Think of slowly rolling a tire down a hill - the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access