Chapter 1. An Introduction to PyTorch
PyTorch is one of the most popular deep learning Python libraries, and it is widely used by the AI research community. Many developers and researchers use PyTorch to accelerate deep learning research experimentation and prototyping.
In this chapter, I will give you a brief introduction to what PyTorch is and some of the features that make it popular. I’ll also show you how to install and set up your PyTorch development environment on your local machine and in the cloud. By the end of this chapter, you will be able to verify that PyTorch is properly installed and run a simple PyTorch program.
What Is PyTorch?
The PyTorch library is primarily developed by Facebook’s AI Research Lab (FAIR) and is free and open source software with over 1,700 contributors. It allows you to easily run array-based calculations, build dynamic neural networks, and perform autodifferentiation in Python with strong graphics processing unit (GPU) acceleration—all important features required for deep learning research. Although some use it for accelerated tensor computing, most use it for deep learning development.
PyTorch’s simple and flexible interface enables fast experimentation. You can load data, apply transforms, and build models with a few lines of code. Then, you have the flexibility to write customized training, validation, and test loops and deploy trained models with ease.
It has a strong ecosystem and a large user community, including universities like Stanford and companies such as Uber, NVIDIA, and Salesforce. In 2019, PyTorch dominated machine learning and deep learning conference proceedings: 69% of the Conference on Computer Vision and Pattern Recognition (CVPR) proceedings used PyTorch, over 75% of both the Association for Computational Linguistics (ACL) and the North American Chapter of the ACL (NAACL) used it, and over 50% of the International Conference on Learning Representations (ICLR) and the International Conference on Machine Learning (ICML) used it as well. There are also over 60,000 repositories on GitHub related to PyTorch.
Many developers and researchers use PyTorch to accelerate deep learning research experimentation and prototyping. Its simple Python API, GPU support, and flexibility make it a popular choice among academic and commercial research organizations. Since being open sourced in 2018, PyTorch has reached a stable release and can be easily installed on Windows, Mac, and Linux operating systems. The framework continues to expand rapidly and now facilitates deployment to production environments in the cloud and mobile platforms.
Why Use PyTorch?
If you’re studying machine learning, conducting deep learning research, or building AI systems, you’ll probably need to use a deep learning framework. A deep learning framework makes it easy to perform common tasks such data loading, preprocessing, model design, training, and deployment. PyTorch has become very popular with the academic and research communities due to its simplicity, flexibility, and Python interface. Here are some reasons to learn and use PyTorch:
- PyTorch is popular
-
Many companies and research organizations use PyTorch as their main deep learning framework. In fact, some companies have built their custom machine learning tools on top of PyTorch. As a result, PyTorch skills are in demand.
- PyTorch is supported by all major cloud platforms, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, and Alibaba Cloud
-
You can spin up a virtual machine with PyTorch preloaded for frictionless development. You can use prebuilt Docker images, perform large-scale training on cloud GPU platforms, and run models at production scale.
- PyTorch is supported by Google Colaboratory and Kaggle Kernels
-
You can run PyTorch code in a browser with no installation or configuration needed. You can compete in Kaggle competitions by running PyTorch directly in your kernel.
- PyTorch is mature and stable
-
PyTorch is regularly maintained and is now beyond release 1.8.
- PyTorch supports CPU, GPU, TPU, and parallel processing
-
You can accelerate your training and inference using GPUs and TPUs. Tensor processing units (TPUs) are AI-accelerated application-specific integrated circuits (ASIC) chips that were developed by Google to provide an alternative to GPUs for NN hardware acceleration. With parallel processing, you can apply preprocessing on your CPU while training a model on the GPU or TPU.
- PyTorch supports distributed training
-
You can train neural networks over multiple GPUs on multiple machines.
- PyTorch supports deployment to production
-
With the newer TorchScript and TorchServe features, you can easily deploy models to production environments including cloud servers.
- PyTorch is beginning to support mobile deployment
-
Although it’s currently experimental, you can now deploy models to iOS and Android devices.
- PyTorch has a vast ecosystem and set of open source libraries
-
Libraries such as Torchvision, fastai, and PyTorch Lightning extend capabilities and support specific fields like natural olanguage processing (NLP) and computer vision.
- PyTorch also has a C++ frontend
-
Although I will focus on the Python interface in this book, PyTorch also supports a frontend C++ interface. If you need to build high-performance, low-latency, or bare-metal applications, you can write them in C++ using the same design and architecture as the Python API.
- PyTorch supports the Open Neural Network Exchange (ONNX) format natively
-
You can easily export your models to ONNX format and use them with ONNX-compatible platforms, runtimes, or visualizers.
- PyTorch has a large community of developers and user forums
-
There are more than 38,000 users on the PyTorch forum, and it’s easy to get support or post questions to the community by visiting the PyTorch Discussion Forum.
Getting Started
If you are familiar with PyTorch, you may already have installed it and set up your development environment. If not, I will show you some options to do so in this section. The fastest way to get started is to use Google Colaboratory (or Colab). Google Colab is a free cloud-based development environment similar to Jupyter Notebook and comes with PyTorch already installed. Colab comes with free limited GPU support and interfaces nicely with Google Drive for saving and sharing notebooks.
If you don’t have internet access, or you want to run the PyTorch code on your own hardware, then I will show you how to install PyTorch on a local machine. You can install PyTorch on Windows, Linux, and macOS operating systems. I recommend that you have an NVIDIA GPU for acceleration, but it is not required.
Lastly, you may want to develop PyTorch code using a cloud platform like AWS, Azure, or GCP. If you would like to use a cloud platform, I will show you the options to quickly get started on each platform.
Running in Google Colaboratory
With Google Colab, you can write and execute Python and PyTorch code in your browser. You can save files directly to your Google Drive account and easily share your work with others. To get started, visit the Google Colab website, as shown in Figure 1-1.
If you are already signed into your Google account, you will get a pop-up window. Click New Notebook in the bottom-right corner. If the pop-up window does not appear, click File and select New Notebook from the menu. You will be prompted to sign in or create a Google account, as shown in Figure 1-2.
To verify your configuration, import the PyTorch library, print the installed version, and check if you are using a GPU, as shown in Figure 1-3.
By default, our Colab notebook does not use a GPU. You will need to select Change Runtime Type from the Runtime menu, then select GPU from the “Hardware accelerator” drop-down menu and click Save, as shown in Figure 1-4.
Now run the cell again by selecting the cell and pressing Shift-Enter. You should see True
as the output of is_available()
, as shown in Figure 1-5.
Note
Google offers a paid version called Colab Pro that provides faster GPUs, longer runtimes, and more memory. For the examples in this book, the free version of Colab should be sufficient.
Now you have verified that PyTorch is installed, and you also know the version. You have also verified that you have a GPU available and that the proper drivers are installed and operating correctly. Next, I will show you how to verify your PyTorch on a local machine.
Running on a Local Computer
You may want to install PyTorch on a local machine or your own server under certain conditions. For example, you may want to work with local storage, or use your own GPU or faster GPU hardware, or you may not have internet access. Running PyTorch does not require a GPU, but one would be needed to run GPU acceleration. I recommend using an NVIDIA GPU as PyTorch is closely tied to the Compute Unified Device Architecture (CUDA) drivers for GPU support.
Warning
Check your GPU and CUDA version first! PyTorch only supports specific GPU and CUDA versions, and many Mac computers use non-NVIDIA GPUs. If you are using a Mac, verify that you have an NVIDIA GPU by clicking the Apple icon on the menu bar, selecting “About This Mac,” and clicking the Displays tab. If you see an NVIDIA GPU on your Mac and want to use it, you’ll have to build PyTorch from scratch. If you do not see an NVIDIA GPU, you should use the CPU-only version of PyTorch or choose another computer with a different OS.
The PyTorch website offers a convenient browser tool for installation, as shown in Figure 1-6. Select the latest stable build, your OS, your preferred Python package manager (Conda is recommended), the Python language, and your CUDA version. Execute the command line and follow the instructions for your configuration. Note the prerequisites, installation instructions, and verification methods.
You should be able to run the verification code snippet in your favorite IDE (Jupyter Notebook, Microsoft Visual Studio Code, PyCharm, Spyder, etc.) or from the terminal. Figure 1-7 shows how to verify that the correct version of PyTorch is installed from a terminal on a Mac. The same commands can be used to verify this in a Windows or Linux terminal as well.
Running on Cloud Platforms
If you’re familiar with cloud platforms like AWS, GCP, or Azure, you can run PyTorch in the cloud. Cloud platforms provide powerful hardware and infrastructure for training and deploying deep learning models. Remember that using cloud services, especially GPU instances, incurs additional costs. To get started, follow the instructions in the online PyTorch cloud setup guide for your platform of interest.
Setting up your cloud environment is beyond the scope of this book, but I’ll summarize the available options. Each platform offers a virtual machine instance as well as managed services to support PyTorch development.
Running on AWS
AWS offers multiple options to run PyTorch in the cloud. If you prefer a fully managed service, you can use AWS SageMaker, or if you’d rather manage your own infrastructure, you can use AWS Deep Learning Amazon Machine Images (AMIs) or Containers:
- Amazon SageMaker
-
This is a fully managed service to train and deploy models. You can run Jupyter Notebooks from the dashboard and use the SageMaker Python SDK to train and deploy models in the cloud. You can run your notebooks on a dedicated GPU instance.
- AWS Deep Learning AMIs
-
These are preconfigured virtual machine environments. You can choose the Conda AMI, which has many libraries (including PyTorch) preinstalled, or you can use the base AMI if you’d prefer a clean environment to set up private repositories or custom builds.
- AWS Deep Learning Containers
-
These are Docker images that come preinstalled with PyTorch. They enable you to skip the process of building and optimizing your environment from scratch and are mainly used for deployment.
For more detailed information on how to get started, review the “Getting Started with PyTorch on AWS” instructions.
Running on Microsoft Azure
Azure also offers multiple options to run PyTorch in the cloud. You can develop PyTorch models using a fully managed service called Azure Machine Learning, or you can run Data Science Virtual Machines (DSVMs) if you prefer to manage your own infrastructure:
- Azure Machine Learning
-
This is an enterprise-grade machine learning service for building and deploying models. It includes a drag-and-drop designer and MLOps capabilities to integrate with existing DevOps processes.
- DSVMs
-
These are preconfigured virtual machine environments. They come preinstalled with PyTorch and other deep learning frameworks as well as development tools like Jupyter Notebook and VS Code.
For more detailed information on how to get started, review the Azure Machine Learning documentation.
Running on Google Cloud Platform
GCP also offers multiple options to run PyTorch in the cloud. You can develop PyTorch models using the managed service, called AI Platform Notebooks, or run Deep Learning VM images if you prefer to manage your own infrastructure:
- AI Platform Notebooks
-
This is a managed service whose integrated JupyterLab environment allows you to create preconfigured GPU instances.
- Deep Learning VM images
-
These are preconfigured virtual machine environments. They come preinstalled with PyTorch and other deep learning frameworks as well as development tools.
For more detailed information on how to get started, review the instructions at Google Cloud “AI and Machine Learning Products”.
Verifying Your PyTorch Environment
Whether you use Colab, your local machine, or your favorite cloud platform, you should verify that PyTorch is properly installed and check to see if you have a GPU available. You’ve already seen how to do this in Colab. To verify that PyTorch is properly installed, use the following code snippet. The code imports the PyTorch library, prints the version, and checks to see if a GPU is available:
import
torch
(
torch
.
__version__
)
(
torch
.
cuda
.
is_available
())
Warning
You import the library using import torch
, not import pytorch
. PyTorch is originally based on the torch
library, an open source machine learning framework based on the C and Lua programming languages. Keeping the library named torch
allows Torch code to be reused with a more efficient PyTorch implementation.
A Fun Example
Now that you have verified that your environment is configured properly, let’s code up a fun example to show some of the features of PyTorch and demonstrate best practices in machine learning. In this example, we’ll build a classic image classifier that will attempt to identify an image’s content based on 1,000 possible classes or choices.
You can access this example from the book’s GitHub repository and follow along. Try running the code in Google Colab, on your local machine, or on a cloud platform like AWS, Azure, or GCP. Don’t worry about understanding all of the concepts of machine learning. We’ll cover them in more detail throughout the book.
Note
In practice, you will import all the necessary libraries at the beginning of your code. However, in this example, we will import the libraries as they are used so you can see which libraries are needed for each task.
First, let’s select an image we’d like to classify. In this example, we’ll choose a nice fresh, hot cup of coffee. Use the following code to download the coffee image to your local environment:
import
urllib.request
url
=
url
=
'https://pytorch.tips/coffee'
fpath
=
'coffee.jpg'
urllib
.
request
.
urlretrieve
(
url
,
fpath
)
Notice that the code uses the urllib
library’s urlretrieve()
function to get an image from the web. We rename the file to coffee.jpg by specifying fpath
.
Next, we read our local image using the Pillow library (PIL):
import
matplotlib.pyplot
as
plt
from
PIL
import
Image
img
=
Image
.
open
(
'coffee.jpg'
)
plt
.
imshow
(
img
)
Figure 1-8 shows what our image looks like. We can use matplotlib
’s imshow()
function to display the image on our system, as shown in the preceding code.
Notice we haven’t used PyTorch yet. Here’s where things get exciting. Next, we are going to pass our image into a pretrained image classification neural network (NN)—but before we do so, we’ll need to preprocess our image. Preprocessing data is very common in machine learning since the NN expects the input to meet certain requirements.
In our example, the image data is an RGB 1600 × 1200-pixel JPEG-formatted image. We need to apply a series of preprocessing steps, called transforms, to convert the image into the proper format for the NN. We do this using Torchvision in the following code:
import
torch
from
torchvision
import
transforms
transform
=
transforms
.
Compose
([
transforms
.
Resize
(
256
),
transforms
.
CenterCrop
(
224
),
transforms
.
ToTensor
(),
transforms
.
Normalize
(
mean
=
[
0.485
,
0.456
,
0.406
],
std
=
[
0.229
,
0.224
,
0.225
])])
img_tensor
=
transform
(
img
)
(
type
(
img_tensor
),
img_tensor
.
shape
)
# out:
# <class 'torch.tensor'> torch.Size([3, 224, 224])
We use the Compose()
transform to define a series of transforms used to preprocess our image. First, we need to resize and crop the image to fit within the NN. The image is currently in PIL format, since that’s how we read it earlier. But our NN requires a tensor input, so we convert the PIL image to a tensor.
Tensors are the fundamental data objects in PyTorch, and we’ll spend the entire next chapter exploring them. You can think of tensors like NumPy arrays or numerical arrays with a bunch of extra features. For now, we’ll just convert our image to a tensor array of numbers to get it ready.
We apply one more transform, called Normalize()
, to rescale the range of pixel values between 0 and 1. The values for the mean and standard deviation (std) were precomputed based on the data used to train the model. Normalizing the image improves the accuracy of the classifier.
Finally, we call transform(img)
to apply all the transforms to the image. As you can see, img_tensor
is a 3 × 224 × 224 torch.Tensor
representing a 3-channel image of 224 × 224
pixels.
Efficient machine learning processes data in batches, and our model will expect a batch of data. However, we only have one image, so we’ll need to create a batch of size 1, as shown in the following code:
batch
=
img_tensor
.
unsqueeze
(
0
)
(
batch
.
shape
)
# out: torch.Size([1, 3, 224, 224])
We use PyTorch’s unsqueeze()
function to add a dimension to our tensor and create a batch of size 1. Now we have a tensor of size 1 × 3 × 224 × 224, which represents a batch size of 1 and 3 channels (RGB) of 224 × 224 pixels. PyTorch provides a lot of useful functions like unsqueeze()
to manipulate tensors, and we’ll explore many of them in the next chapter.
Now our image is ready for our classifier NN! We’ll use a famous image classifier called AlexNet. AlexNet won the ImageNet Large Scale Visual Recognition Challenge in 2012. It’s easy to load this model using Torchvision, as shown in the following code:
from
torchvision
import
models
model
=
models
.
alexnet
(
pretrained
=
True
)
We’re going to use a pretrained model here, so we don’t need to train it. The AlexNet model has been pretrained with millions of images and does a pretty good job at classifying images. Let’s pass in our image and see how it does:
device
=
"cuda"
if
torch
.
cuda
.
is_available
()
else
"cpu"
(
device
)
# out(results will vary): cpu
model
.
eval
()
model
.
to
(
device
)
y
=
model
(
batch
.
to
(
device
))
(
y
.
shape
)
# out: torch.Size([1, 1000])
GPU acceleration is a key benefit of PyTorch. In the first line, we use PyTorch’s cuda.is_available()
function to see if our machine has a GPU. This is a very common line of PyTorch code, and we’ll explore GPUs further in Chapters 2 and 6. We’re only classifying one image, so we don’t need a GPU here, but if we had a huge batch having a GPU might help speed things up.
The model.eval()
function configures our AlexNet model for inference or prediction (as opposed to training). Certain components of the model are only used during training, and we don’t want to use them here. The use of model.to(device)
and batch.to(device)
sends our model and input data to the GPU if available, and executing model(batch.to(device))
runs our classifier.
The output, y
, consists of a batch of 1,000 outputs. Since our batch contains only one image, the first dimension is 1
while the number of classes is 1000
, one value for each class. The higher the value, the more likely it is that the image contains that class. The following code finds the winning class:
y_max
,
index
=
torch
.
max
(
y
,
1
)
(
index
,
y_max
)
# out: tensor([967]) tensor([22.3059],
# grad_fn=<MaxBackward0>)
Using PyTorch’s max()
function, we see that the class with index 967 has the highest value, 22.3059, and thus is the winner. However, we don’t know what class 967 represents. Let’s load the file with class names and find out:
url
=
'https://pytorch.tips/imagenet-labels'
fpath
=
'imagenet_class_labels.txt'
urllib
.
request
.
urlretrieve
(
url
,
fpath
)
with
open
(
'imagenet_class_labels.txt'
)
as
f
:
classes
=
[
line
.
strip
()
for
line
in
f
.
readlines
()]
(
classes
[
967
])
# out: 967: 'espresso',
Like we did earlier, we use urlretrieve()
and download the text file containing descriptions of each class. Then, we read the file using readlines()
and create a list containing class names. When we print(classes[967])
, it shows us that class 967 is espresso!
Using PyTorch’s softmax()
function, we can convert the output values to probabilities:
prob
=
torch
.
nn
.
functional
.
softmax
(
y
,
dim
=
1
)[
0
]
*
100
(
classes
[
index
[
0
]],
prob
[
index
[
0
]]
.
item
())
#967: 'espresso', 87.85208892822266
To print the probability at an index, we use PyTorch’s tensor.item()
method. The item()
method is frequently used and returns the numeric value contained in a tensor. The results show that the model is 87.85% sure that this image is an image of an espresso.
We can use PyTorch’s sort()
function to sort the output probabilities and look at the top five:
_
,
indices
=
torch
.
sort
(
y
,
descending
=
True
)
for
idx
in
indices
[
0
][:
5
]:
(
classes
[
idx
],
prob
[
idx
]
.
item
())
# out:
# 967: 'espresso', 87.85208892822266
# 968: 'cup', 7.28359317779541
# 504: 'coffee mug', 4.33521032333374
# 925: 'consomme', 0.36686763167381287
# 960: 'chocolate sauce, chocolate syrup',
# 0.09037172049283981
We see that the model predicts that the image is espresso with 87.85% probability. It also predicts cup with 7.28% and coffee mug with 4.3% probability, but it seems pretty confident that the image is an espresso.
You may feel like you need an espresso right now. We covered a lot in that example! The core code to accomplish everything is actually much shorter. Assuming you have downloaded the files already, you only need to run the following code to classify an image using AlexNet:
import
torch
from
torchvision
import
transforms
,
models
transform
=
transforms
.
Compose
([
transforms
.
Resize
(
256
),
transforms
.
CenterCrop
(
224
),
transforms
.
ToTensor
(),
transforms
.
Normalize
(
mean
=
[
0.485
,
0.456
,
0.406
],
std
=
[
0.229
,
0.224
,
0.225
])])
img_tensor
=
transform
(
img
)
batch
=
img_tensor
.
unsqueeze
(
0
)
model
=
models
.
alexnet
(
pretrained
=
True
)
device
=
"cuda"
if
torch
.
cuda
.
is_available
()
else
"cpu"
model
.
eval
()
model
.
to
(
device
)
y
=
model
(
batch
.
to
(
device
))
prob
=
torch
.
nn
.
functional
.
softmax
(
y
,
dim
=
1
)[
0
]
*
100
_
,
indices
=
torch
.
sort
(
y
,
descending
=
True
)
for
idx
in
indices
[
0
][:
5
]:
(
classes
[
idx
],
prob
[
idx
]
.
item
())
And that’s how you build an image classifier with PyTorch. Try running your own images through the model and see how it classifies them. Also, try completing the example on another platform. For example, if you used Colab to run the code, try running it locally or in the cloud.
Congratulations, you’ve verified that your environment is configured properly and that you can execute PyTorch code! We’ll explore each topic more deeply throughout the remainder of the book. In the next chapter, we’ll explore the fundamentals of PyTorch and provide a quick reference to tensors and their operations.
Get PyTorch Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.