Skip to Content
Deep Learning for Coders with fastai and PyTorch
book

Deep Learning for Coders with fastai and PyTorch

by Jeremy Howard, Sylvain Gugger
July 2020
Intermediate to advanced
621 pages
16h 47m
English
O'Reilly Media, Inc.
Content preview from Deep Learning for Coders with fastai and PyTorch

Chapter 11. Data Munging with fastai’s Mid-Level API

We have seen what Tokenizer and Numericalize do to a collection of texts, and how they’re used inside the data block API, which handles those transforms for us directly using the TextBlock. But what if we want to apply only one of those transforms, either to see intermediate results or because we have already tokenized texts? More generally, what can we do when the data block API is not flexible enough to accommodate our particular use case? For this, we need to use fastai’s mid-level API for processing data. The data block API is built on top of that layer, so it will allow you to do everything the data block API does, and much much more.

Going Deeper into fastai’s Layered API

The fastai library is built on a layered API. In the very top layer are applications that allow us to train a model in five lines of code, as we saw in Chapter 1. In the case of creating DataLoaders for a text classifier, for instance, we used this line:

from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')

The factory method TextDataLoaders.from_folder is very convenient when your data is arranged the exact same way as the IMDb dataset, but in practice, that often won’t be the case. The data block API offers more flexibility. As we saw in the preceding chapter, we can get the same result with the following:

path = untar_data(URLs.IMDB)
dls = DataBlock(
    blocks=(TextBlock.from_folder(path),CategoryBlock
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

Sebastian Raschka

Publisher Resources

ISBN: 9781492045519Errata Page