Accelerate Model Training with PyTorch 2.X

Book description

Dramatically accelerate the building process of complex models using PyTorch to extract the best performance from any computing environment

Key Features

  • Reduce the model-building time by applying optimization techniques and approaches
  • Harness the computing power of multiple devices and machines to boost the training process
  • Focus on model quality by quickly evaluating different model configurations
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

This book, written by an HPC expert with over 25 years of experience, guides you through enhancing model training performance using PyTorch. Here you’ll learn how model complexity impacts training time and discover performance tuning levels to expedite the process, as well as utilize PyTorch features, specialized libraries, and efficient data pipelines to optimize training on CPUs and accelerators. You’ll also reduce model complexity, adopt mixed precision, and harness the power of multicore systems and multi-GPU environments for distributed training. By the end, you'll be equipped with techniques and strategies to speed up training and focus on building stunning models.

What you will learn

  • Compile the model to train it faster
  • Use specialized libraries to optimize the training on the CPU
  • Build a data pipeline to boost GPU execution
  • Simplify the model through pruning and compression techniques
  • Adopt automatic mixed precision without penalizing the model's accuracy
  • Distribute the training step across multiple machines and devices

Who this book is for

This book is for intermediate-level data scientists who want to learn how to leverage PyTorch to speed up the training process of their machine learning models by employing a set of optimization strategies and techniques. To make the most of this book, familiarity with basic concepts of machine learning, PyTorch, and Python is essential. However, there is no obligation to have a prior understanding of distributed computing, accelerators, or multicore processors.

Table of contents

  1. Accelerate Model Training with PyTorch 2.X
  2. Foreword
  3. Contributors
  4. About the author
  5. About the reviewer
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  7. Part 1: Paving the Way
  8. Chapter 1: Deconstructing the Training Process
    1. Technical requirements
    2. Remembering the training process
      1. Dataset
      2. The training algorithm
    3. Understanding the computational burden of the model training phase
      1. Hyperparameters
      2. Operations
      3. Parameters
    4. Quiz time!
    5. Summary
  9. Chapter 2: Training Models Faster
    1. Technical requirements
    2. What options do we have?
      1. Modifying the software stack
      2. Increasing computing resources
    3. Modifying the application layer
      1. What can we change in the application layer?
      2. Getting hands-on
      3. What if we change the batch size?
    4. Modifying the environment layer
      1. What can we change in the environment layer?
      2. Getting hands-on
    5. Quiz time!
    6. Summary
  10. Part 2: Going Faster
  11. Chapter 3: Compiling the Model
    1. Technical requirements
    2. What do you mean by compiling?
      1. Execution modes
      2. Model compiling
    3. Using the Compile API
      1. Basic usage
      2. Give me a real fight – training a heavier model!
    4. How does the Compile API work under the hood?
      1. Compiling workflow and components
      2. Backends
    5. Quiz time!
    6. Summary
  12. Chapter 4: Using Specialized Libraries
    1. Technical requirements
    2. Multithreading with OpenMP
      1. What is multithreading?
      2. Using and configuring OpenMP
      3. Using and configuring Intel OpenMP
    3. Optimizing Intel CPU with IPEX
      1. Using IPEX
      2. How does IPEX work under the hood?
    4. Quiz time!
    5. Summary
  13. Chapter 5: Building an Efficient Data Pipeline
    1. Technical requirements
    2. Why do we need an efficient data pipeline?
      1. What is a data pipeline?
      2. How to build a data pipeline
      3. Data pipeline bottleneck
    3. Accelerating data loading
      1. Optimizing a data transfer to the GPU
      2. Configuring data pipeline workers
      3. Reaping the rewards
    4. Quiz time!
    5. Summary
  14. Chapter 6: Simplifying the Model
    1. Technical requirements
    2. Knowing the model simplifying process
      1. Why simplify a model? (reason)
      2. How to simplify a model? (process)
      3. When do we simplify a model? (moment)
    3. Using Microsoft NNI to simplify a model
      1. Overview of NNI
      2. NNI in action!
    4. Quiz time!
    5. Summary
  15. Chapter 7: Adopting Mixed Precision
    1. Technical requirements
    2. Remembering numeric precision
      1. How do computers represent numbers?
      2. Floating-point representation
      3. Novel data types
      4. A summary, please!
    3. Understanding the mixed precision strategy
      1. What is mixed precision?
      2. Why use mixed precision?
      3. How to use mixed precision
      4. How about Tensor Cores?
    4. Enabling AMP
      1. Activating AMP on GPU
      2. AMP, show us what you are capable of!
    5. Quiz time!
    6. Summary
  16. Part 3: Going Distributed
  17. Chapter 8: Distributed Training at a Glance
    1. Technical requirements
    2. A first look at distributed training
      1. When do we need to distribute the training process?
      2. Where do we execute distributed training?
    3. Learning the fundamentals of parallelism strategies
      1. Model parallelism
      2. Data parallelism
    4. Distributed training on PyTorch
      1. Basic workflow
      2. Communication backend and program launcher
    5. Quiz time!
    6. Summary
  18. Chapter 9: Training with Multiple CPUs
    1. Technical requirements
    2. Why distribute the training on multiple CPUs?
      1. Why not increase the number of threads?
      2. Distributed training on rescue
    3. Implementing distributed training on multiple CPUs
      1. The Gloo communication backend
      2. Coding distributed training to run on multiple CPUs
      3. Launching distributed training on multiple CPUs
    4. Getting faster with Intel oneCCL
      1. What is Intel oneCCL?
      2. Code implementation and launching
      3. Is oneCCL really better?
    5. Quiz time!
    6. Summary
  19. Chapter 10: Training with Multiple GPUs
    1. Technical requirements
    2. Demystifying the multi-GPU environment
      1. The popularity of multi-GPU environments
      2. Understanding multi-GPU interconnection
      3. How does interconnection topology affect performance?
      4. Discovering the interconnection topology
      5. Setting GPU affinity
    3. Implementing distributed training on multiple GPUs
      1. The NCCL communication backend
      2. Coding and launching distributed training with multiple GPUs
      3. Experimental evaluation
    4. Quiz time!
    5. Summary
  20. Chapter 11: Training with Multiple Machines
    1. Technical requirements
    2. What is a computing cluster?
      1. Workload manager
      2. Understanding the high-performance network
    3. Implementing distributed training on multiple machines
      1. Getting introduced to Open MPI
      2. Why use Open MPI and NCCL?
      3. Coding and launching the distributed training on multiple machines
      4. Experimental evaluation
    4. Quiz time!
    5. Summary
  21. Index
    1. Why subscribe?
  22. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Accelerate Model Training with PyTorch 2.X
  • Author(s): Maicon Melo Alves
  • Release date: April 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781805120100