Preface

Deep Learning in the World Today

Hello and welcome! This book will introduce you to deep learning via PyTorch, an open source library released by Facebook in 2017. Unless you’ve had your head stuck in the ground in a very good impression of an ostrich the past few years, you can’t have helped but notice that neural networks are everywhere these days. They’ve gone from being the really cool bit of computer science that people learn about and then do nothing with to being carried around with us in our phones every day to improve our pictures or listen to our voice commands. Our email software reads our email and produces context-sensitive replies, our speakers listen out for us, cars drive by themselves, and the computer has finally bested humans at Go. We’re also seeing the technology being used for more nefarious ends in authoritarian countries, where neural network–backed sentinels can pick faces out of crowds and make a decision on whether they should be apprehended.

And yet, despite the feeling that this has all happened so fast, the concepts of neural networks and deep learning go back a long way. The proof that such a network could function as a way of replacing any mathematical function in an approximate way, which underpins the idea that neural networks can be trained for many different tasks, dates back to 1989,1 and convolutional neural networks were being used to recognize digits on check in the late ’90s. There’s been a solid foundation building up all this time, so why does it feel like an explosion occurred in the last 10 years?

There are many reasons, but prime among them has to be the surge in graphical processing units (GPUs) performance and their increasing affordability. Designed originally for gaming, GPUs need to perform countless millions of matrix operations per second in order to render all the polygons for the driving or shooting game you’re playing on your console or PC, operations that a standard CPU just isn’t optimized for. A 2009 paper, “Large-Scale Deep Unsupervised Learning Using Graphics Processors” by Rajat Raina et al., pointed out that training neural networks was also based on performing lots of matrix operations, and so these add-on graphics cards could be used to speed up training as well as make larger, deeper neural network architectures feasible for the first time. Other important techniques such as Dropout (which we will look at in Chapter 3) were also introduced in the last decade as ways to not just speed up training but make training more generalized (so that the network doesn’t just learn to recognize the training data, a problem called overfitting that we’ll encounter in the next chapter). In the last couple of years, companies have taken this GPU-based approach to the next level, with Google creating what it describes as tensor processing units (TPUs), which are devices custom-built for performing deep learning as fast as possible, and are even available to the general public as part of their Google Cloud ecosystem.

Another way to chart deep learning’s progress over the past decade is through the ImageNet competition. A massive database of over 14 million pictures, manually labeled into 20,000 categories, ImageNet is a treasure trove of labeled data for machine learning purposes. Since 2010, the yearly ImageNet Large Scale Visual Recognition Challenge has sought to test all comers against a 1,000-category subset of the database, and until 2012, error rates for tackling the challenge rested around 25%. That year, however, a deep convolutional neural network won the competition with an error of 16%, massively outperforming all other entrants. In the years that followed, that error rate got pushed down further and further, to the point that in 2015, the ResNet architecture obtained a result of 3.6%, which beat the average human performance on ImageNet (5%). We had been outclassed.

But What Is Deep Learning Exactly, and Do I Need a PhD to Understand It?

Deep learning’s definition often is more confusing than enlightening. A way of defining it is to say that deep learning is a machine learning technique that uses multiple and numerous layers of nonlinear transforms to progressively extract features from raw input. Which is true, but it doesn’t really help, does it? I prefer to describe it as a technique to solve problems by providing the inputs and desired outputs and letting the computer find the solution, normally using a neural network.

One thing about deep learning that scares off a lot of people is the mathematics. Look at just about any paper in the field and you’ll be subjected to almost impenetrable amounts of notation with Greek letters all over the place, and you’ll likely run screaming for the hills. Here’s the thing: for the most part, you don’t need to be a math genius to use deep learning techniques. In fact, for most day-to-day basic uses of the technology, you don’t need to know much at all, and to really understand what’s going on (as you’ll see in Chapter 2), you only have to stretch a little to understand concepts that you probably learned in high school. So don’t be too scared about the math. By the end of Chapter 3, you’ll be able to put together an image classifier that rivals what the best minds in 2015 could offer with just a few lines of code.

PyTorch

As I mentioned back at the start, PyTorch is an open source offering from Facebook that facilitates writing deep learning code in Python. It has two lineages. First, and perhaps not entirely surprisingly given its name, it derives many features and concepts from Torch, which was a Lua-based neural network library that dates back to 2002. Its other major parent is Chainer, created in Japan in 2015. Chainer was one of the first neural network libraries to offer an eager approach to differentiation instead of defining static graphs, allowing for greater flexibility in the way networks are created, trained, and operated. The combination of the Torch legacy plus the ideas from Chainer has made PyTorch popular over the past couple of years.2

The library also comes with modules that help with manipulating text, images, and audio (torchtext, torchvision, and torchaudio), along with built-in variants of popular architectures such as ResNet (with weights that can be downloaded to provide assistance with techniques like transfer learning, which you’ll see in Chapter 4).

Aside from Facebook, PyTorch has seen quick acceptance by industry, with companies such as Twitter, Salesforce, Uber, and NVIDIA using it in various ways for their deep learning work. Ah, but I sense a question coming….

What About TensorFlow?

Yes, let’s address the rather large, Google-branded elephant in the corner. What does PyTorch offer that TensorFlow doesn’t? Why should you learn PyTorch instead?

The answer is that traditional TensorFlow works in a different way than PyTorch that has major implications for code and debugging. In TensorFlow, you use the library to build up a graph representation of the neural network architecture and then you execute operations on that graph, which happens within the TensorFlow library. This method of declarative programming is somewhat at odds with Python’s more imperative paradigm, meaning that Python TensorFlow programs can look and feel somewhat odd and difficult to understand. The other issue is that the static graph declaration can make dynamically altering the architecture during training and inference time a lot more complicated and stuffed with boilerplate than with PyTorch’s approach.

For these reasons, PyTorch has become popular in research-oriented communities. The number of papers submitted to the International Conference on Learning Representations that mention PyTorch has jumped 200% in the past year, and the number of papers mentioning TensorFlow has increased almost equally. PyTorch is definitely here to stay.

However, things are changing in more recent versions of TensorFlow. A new feature called eager execution has been recently added to the library that allows it to work similarly to PyTorch and will be the paradigm promoted in TensorFlow 2.0. But as it’s new resources outside of Google that help you learn this new method of working with TensorFlow are thin on the ground, plus you’d need years of work out there to understand the other paradigm in order to get the most out of the library.

But none of this should make you think poorly of TensorFlow; it remains an industry-proven library with support from one of the biggest companies on the planet. PyTorch (backed, of course, by a different biggest company on the planet) is, I would say, a more streamlined and focused approach to deep learning and differential programming. Because it doesn’t have to continue supporting older, crustier APIs, it is easier to teach and become productive in PyTorch than in TensorFlow.

Where does Keras fit in with this? So many good questions! Keras is a high-level deep learning library that originally supported Theano and TensorFlow, and now also supports certain other frames such as Apache MXNet. It provides certain features such as training, validation, and test loops that the lower-level frameworks leave as an exercise for the developer, as well as simple methods of building up neural network architectures. It has contributed hugely to the take-up of TensorFlow, and is now part of TensorFlow itself (as tf.keras) as well as continuing to be a separate project. PyTorch, in comparison, is something of a middle ground between the low level of raw TensorFlow and Keras; we will have to write our own training and inference routines, but creating neural networks is almost as straightforward (and I would say that PyTorch’s approach to making and reusing architectures is much more logical to a Python developer than some of Keras’s magic).

As you’ll see in this book, although PyTorch is common in more research-oriented positions, with the advent of PyTorch 1.0, it’s perfectly suited to production use cases.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Supplemental material (including code examples and exercises) is available for download at https://oreil.ly/pytorch-github.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Programming PyTorch for Deep Learning by Ian Pointer (O’Reilly). Copyright 2019 Ian Pointer, 978-1-492-04535-9.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Online Learning

Note

For almost 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/prgrming-pytorch-for-dl.

Email to comment or ask technical questions about this book.

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

A big thank you to my editor, Melissa Potter, my family, and Tammy Edlund for all their help in making this book possible. Thank you, also, to the technical reviewers who provided valuable feedback throughout the writing process, including Phil Rhodes, David Mertz, Charles Givre, Dominic Monn, Ankur Patel, and Sarah Nagy.

1 See “Approximation by Superpositions of Sigmoidal Functions”, by George Cybenko (1989).

2 Note that PyTorch borrows ideas from Chainer, but not actual code.

Get Programming PyTorch for Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.