PyTorch 1.x Reinforcement Learning Cookbook

Book description

Implement reinforcement learning techniques and algorithms with the help of real-world examples and recipes

Key Features

  • Use PyTorch 1.x to design and build self-learning artificial intelligence (AI) models
  • Implement RL algorithms to solve control and optimization challenges faced by data scientists today
  • Apply modern RL libraries to simulate a controlled environment for your projects

Book Description

Reinforcement learning (RL) is a branch of machine learning that has gained popularity in recent times. It allows you to train AI models that learn from their own actions and optimize their behavior. PyTorch has also emerged as the preferred tool for training RL models because of its efficiency and ease of use.

With this book, you'll explore the important RL concepts and the implementation of algorithms in PyTorch 1.x. The recipes in the book, along with real-world examples, will help you master various RL techniques, such as dynamic programming, Monte Carlo simulations, temporal difference, and Q-learning. You'll also gain insights into industry-specific applications of these techniques. Later chapters will guide you through solving problems such as the multi-armed bandit problem and the cartpole problem using the multi-armed bandit algorithm and function approximation. You'll also learn how to use Deep Q-Networks to complete Atari games, along with how to effectively implement policy gradients. Finally, you'll discover how RL techniques are applied to Blackjack, Gridworld environments, internet advertising, and the Flappy Bird game.

By the end of this book, you'll have developed the skills you need to implement popular RL algorithms and use RL techniques to solve real-world problems.

What you will learn

  • Use Q-learning and the state-action-reward-state-action (SARSA) algorithm to solve various Gridworld problems
  • Develop a multi-armed bandit algorithm to optimize display advertising
  • Scale up learning and control processes using Deep Q-Networks
  • Simulate Markov Decision Processes, OpenAI Gym environments, and other common control problems
  • Select and build RL models, evaluate their performance, and optimize and deploy them
  • Use policy gradient methods to solve continuous RL problems

Who this book is for

Machine learning engineers, data scientists and AI researchers looking for quick solutions to different reinforcement learning problems will find this book useful. Although prior knowledge of machine learning concepts is required, experience with PyTorch will be useful but not necessary.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. PyTorch 1.x Reinforcement Learning Cookbook
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Sections
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
      5. See also
    5. Get in touch
      1. Reviews
  6. Getting Started with Reinforcement Learning and PyTorch
    1. Setting up the working environment
      1. How to do it...
      2. How it works...
      3. There's more... 
      4. See also
    2. Installing OpenAI Gym
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    3. Simulating Atari environments
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Simulating the CartPole environment
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Reviewing the fundamentals of PyTorch
      1. How to do it...
      2. There's more...
      3. See also
    6. Implementing and evaluating a random search policy
      1. How to do it...
      2. How it works...
      3. There's more...
    7. Developing the hill-climbing algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    8. Developing a policy gradient algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
  7. Markov Decision Processes and Dynamic Programming
    1. Technical requirements
    2. Creating a Markov chain
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    3. Creating an MDP
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Performing policy evaluation
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Simulating the FrozenLake environment
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Solving an MDP with a value iteration algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
    7. Solving an MDP with a policy iteration algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    8. Solving the coin-flipping gamble problem
      1. How to do it...
      2. How it works...
      3. There's more...
  8. Monte Carlo Methods for Making Numerical Estimations
    1. Calculating Pi using the Monte Carlo method
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    2. Performing Monte Carlo policy evaluation
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Playing Blackjack with Monte Carlo prediction
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    4. Performing on-policy Monte Carlo control
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Developing MC control with epsilon-greedy policy
      1. How to do it...
      2. How it works...
    6. Performing off-policy Monte Carlo control
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    7. Developing MC control with weighted importance sampling
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
  9. Temporal Difference and Q-Learning
    1. Setting up the Cliff Walking environment playground
      1. Getting ready
      2. How to do it...
      3. How it works...
    2. Developing the Q-learning algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Setting up the Windy Gridworld environment playground
      1. How to do it...
      2. How it works...
    4. Developing the SARSA algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
    5. Solving the Taxi problem with Q-learning
      1. Getting ready
      2. How to do it...
      3. How it works...
    6. Solving the Taxi problem with SARSA
      1. How to do it...
      2. How it works...
      3. There's more...
    7. Developing the Double Q-learning algorithm
      1. How to do it...
      2. How it works...
      3. See also
  10. Solving Multi-armed Bandit Problems
    1. Creating a multi-armed bandit environment
      1. How to do it...
      2. How it works...
    2. Solving multi-armed bandit problems with the epsilon-greedy policy
      1. How to do it...
      2. How it works...
      3. There's more...
    3. Solving multi-armed bandit problems with the softmax exploration
      1. How to do it...
      2. How it works...
    4. Solving multi-armed bandit problems with the upper confidence bound algorithm
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    5. Solving internet advertising problems with a multi-armed bandit
      1. How to do it...
      2. How it works...
    6. Solving multi-armed bandit problems with the Thompson sampling algorithm
      1. How to do it...
      2. How it works...
      3. See also
    7. Solving internet advertising problems with contextual bandits
      1. How to do it...
      2. How it works...
  11. Scaling Up Learning with Function Approximation
    1. Setting up the Mountain Car environment playground
      1. Getting ready
      2. How to do it...
      3. How it works...
    2. Estimating Q-functions with gradient descent approximation
      1. How to do it...
      2. How it works...
      3. See also
    3. Developing Q-learning with linear function approximation
      1. How to do it...
      2. How it works...
    4. Developing SARSA with linear function approximation
      1. How to do it...
      2. How it works...
    5. Incorporating batching using experience replay
      1. How to do it...
      2. How it works...
    6. Developing Q-learning with neural network function approximation
      1. How to do it...
      2. How it works...
      3. See also
    7. Solving the CartPole problem with function approximation
      1. How to do it...
      2. How it works...
  12. Deep Q-Networks in Action
    1. Developing deep Q-networks
      1. How to do it...
      2. How it works...
      3. See also
    2. Improving DQNs with experience replay
      1. How to do it...
      2. How it works...
    3. Developing double deep Q-Networks
      1. How to do it...
      2. How it works...
    4. Tuning double DQN hyperparameters for CartPole
      1. How to do it...
      2. How it works...
    5. Developing Dueling deep Q-Networks
      1. How to do it...
      2. How it works...
    6. Applying Deep Q-Networks to Atari games
      1. How to do it...
      2. How it works...
    7. Using convolutional neural networks for Atari games
      1. How to do it...
      2. How it works...
      3. See also
  13. Implementing Policy Gradients and Policy Optimization
    1. Implementing the REINFORCE algorithm
      1. How to do it...
      2. How it works...
      3. See also
    2. Developing the REINFORCE algorithm with baseline
      1. How to do it...
      2. How it works...
    3. Implementing the actor-critic algorithm
      1. How to do it...
      2. How it works...
    4. Solving Cliff Walking with the actor-critic algorithm
      1. How to do it...
      2. How it works...
    5. Setting up the continuous Mountain Car environment
      1. How to do it...
      2. How it works...
    6. Solving the continuous Mountain Car environment with the advantage actor-critic network
      1. How to do it...
      2. How it works...
      3. There's more...
      4. See also
    7. Playing CartPole through the cross-entropy method
      1. How to do it...
      2. How it works...
  14. Capstone Project – Playing Flappy Bird with DQN
    1. Setting up the game environment
      1. Getting ready
      2. How to do it...
      3. How it works...
    2. Building a Deep Q-Network to play Flappy Bird
      1. How to do it...
      2. How it works...
    3. Training and tuning the network
      1. How to do it...
      2. How it works...
    4. Deploying the model and playing the game
      1. How to do it...
      2. How it works...
  15. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: PyTorch 1.x Reinforcement Learning Cookbook
  • Author(s): Yuxi Liu
  • Release date: October 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781838551964