Skip to Content
Deep Learning for Coders with fastai and PyTorch
book

Deep Learning for Coders with fastai and PyTorch

by Jeremy Howard, Sylvain Gugger
July 2020
Intermediate to advanced
621 pages
16h 47m
English
O'Reilly Media, Inc.
Content preview from Deep Learning for Coders with fastai and PyTorch

Chapter 16. The Training Process

You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we’re done, right? Not quite yet. We still have to explore a little bit more of the training process.

We explained in Chapter 4 the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:

new_weight = weight - lr * weight.grad

We implemented this from scratch in a training loop, and saw that PyTorch provides a simple nn.SGD class that does this calculation for each parameter for us. In this chapter, we will build some faster optimizers, using a flexible foundation. But that’s not all we might want to change in the training process. For any tweak of the training loop, we will need a way to add some code to the basis of SGD. The fastai library has a system of callbacks to do this, and we will teach you all about it.

Let’s start with standard SGD to get a baseline; then we will introduce the most commonly used optimizers.

Establishing a Baseline

First we’ll create a baseline using plain SGD and compare it to fastai’s default optimizer. We’ll start by grabbing Imagenette with the same get_data we used in Chapter 14:

dls = get_data(URLs.IMAGENETTE_160, 160, 128)

We’ll create ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Sebastian Raschka

Publisher Resources

ISBN: 9781492045519Errata Page