Chapter 6. PyTorch Acceleration and Optimization

In the previous chapters, you learned how to use the built-in capabilities of PyTorch and extend those capabilities by creating your own custom components for deep learning. Doing so enables you to quickly design new models and algorithms to train them.

However, when dealing with very large datasets or more complex models, training your models on a single CPU or GPU can take an extremely long time—it may take days or even weeks to get preliminary results. Longer training times can become frustrating, especially when you want to conduct many experiments using different hyperparameter configurations.

In this chapter, we’ll explore state-of-the-art techniques to accelerate and optimize your model development with PyTorch. First, we’ll look at using tensor processing units (TPUs) instead of GPU devices and consider instances in which using TPUs can improve performance. Next, I’ll show you how to use PyTorch’s built-in capabilities for parallel processing and distributed training. This will provide a quick reference for training models across multiple GPUs and multiple machines so you can quickly scale your training when more hardware resources are available. After exploring ways to accelerate training, we’ll look at how to optimize your models using advanced techniques like hyperparameter tuning, quantization, and pruning.

The chapter will also provide reference code to make getting started easy, and references to the key packages and ...

Get PyTorch Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.