Chapter 16. The Training Process
You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we’re done, right? Not quite yet. We still have to explore a little bit more of the training process.
We explained in Chapter 4 the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:
new_weight
=
weight
-
lr
*
weight
.
grad
We implemented this from scratch in a training loop, and saw that
PyTorch provides a simple nn.SGD
class that does this calculation for
each parameter for us. In this chapter, we will build some faster
optimizers, using a flexible foundation. But that’s not all we might want to change in the training process. For any tweak of
the training loop, we will need a way to add some code to the basis of
SGD. The fastai library has a system of callbacks to do this, and we
will teach you all about it.
Let’s start with standard SGD to get a baseline; then we will introduce the most commonly used optimizers.
Establishing a Baseline
First we’ll create a baseline using plain SGD and compare
it to fastai’s default optimizer. We’ll start by
grabbing Imagenette with the same get_data
we used in
Chapter 14:
dls
=
get_data
(
URLs
.
IMAGENETTE_160
,
160
,
128
)
We’ll create ...
Get Deep Learning for Coders with fastai and PyTorch now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.