Chapter 16. The Training Process

You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we’re done, right? Not quite yet. We still have to explore a little bit more of the training process.

We explained in Chapter 4 the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:

new_weight = weight - lr * weight.grad

We implemented this from scratch in a training loop, and saw that PyTorch provides a simple nn.SGD class that does this calculation for each parameter for us. In this chapter, we will build some faster optimizers, using a flexible foundation. But that’s not all we might want to change in the training process. For any tweak of the training loop, we will need a way to add some code to the basis of SGD. The fastai library has a system of callbacks to do this, and we will teach you all about it.

Let’s start with standard SGD to get a baseline; then we will introduce the most commonly used optimizers.

Establishing a Baseline

First we’ll create a baseline using plain SGD and compare it to fastai’s default optimizer. We’ll start by grabbing Imagenette with the same get_data we used in Chapter 14:

dls = get_data(URLs.IMAGENETTE_160, 160, 128)

We’ll create ...

Get Deep Learning for Coders with fastai and PyTorch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.