Chapter 16. The Training Process
You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we’re done, right? Not quite yet. We still have to explore a little bit more of the training process.
We explained in Chapter 4 the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:
new_weight=weight-lr*weight.grad
We implemented this from scratch in a training loop, and saw that
PyTorch provides a simple nn.SGD class that does this calculation for
each parameter for us. In this chapter, we will build some faster
optimizers, using a flexible foundation. But that’s not all we might want to change in the training process. For any tweak of
the training loop, we will need a way to add some code to the basis of
SGD. The fastai library has a system of callbacks to do this, and we
will teach you all about it.
Let’s start with standard SGD to get a baseline; then we will introduce the most commonly used optimizers.
Establishing a Baseline
First we’ll create a baseline using plain SGD and compare
it to fastai’s default optimizer. We’ll start by
grabbing Imagenette with the same get_data we used in
Chapter 14:
dls=get_data(URLs.IMAGENETTE_160,160,128)
We’ll create ...