Chapter 4. Using Data with PyTorch
In the first three chapters of this book, you trained models using a variety of data, from the Fashion MNIST dataset that was conveniently bundled via an API to the image-based “Horses or Humans” and “Dogs vs. Cats” datasets, which were available as ZIP files that you had to download and preprocess. So by now, you’ve probably realized that there are lots of different ways of getting the data with which to train a model.
However, many public datasets require you to learn lots of different domain-specific skills before you begin to consider your model architecture. The goal behind PyTorch domains and the tools available at the torch.utils.data.Datasets namespace is to expose datasets in a way that’s easy to consume, where all the preprocessing steps of acquiring the data and getting it into PyTorch-friendly APIs are done for you.
You’ve already seen a little of this idea in how PyTorch handled Fashion MNIST back in Chapter 2. As a recap, all you had to do to get the data was this:
train_dataset=datasets.FashionMNIST(root='./data',train=True,download=True,transform=transform)
In the case of this dataset, we also did an import from the torchvision library to get the datasets object that contained the reference to Fashion MNIST:
fromtorchvisionimportdatasets
Given that it’s a computer vision–oriented dataset, it makes sense that it would be in the torchvision library.
PyTorch has many other datasets of different data types that can be loaded ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access