The Reinforcement Learning Workshop

Book description

Start with the basics of reinforcement learning and explore deep learning concepts such as deep Q-learning, deep recurrent Q-networks, and policy-based methods with this practical guide

Key Features

  • Use TensorFlow to write reinforcement learning agents for performing challenging tasks
  • Learn how to solve finite Markov decision problems
  • Train models to understand popular video games like Breakout

Book Description

Various intelligent applications such as video games, inventory management software, warehouse robots, and translation tools use reinforcement learning (RL) to make decisions and perform actions that maximize the probability of the desired outcome. This book will help you to get to grips with the techniques and the algorithms for implementing RL in your machine learning models.

Starting with an introduction to RL, you'll be guided through different RL environments and frameworks. You'll learn how to implement your own custom environments and use OpenAI baselines to run RL algorithms. Once you've explored classic RL techniques such as Dynamic Programming, Monte Carlo, and TD Learning, you'll understand when to apply the different deep learning methods in RL and advance to deep Q-learning. The book will even help you understand the different stages of machine-based problem-solving by using DARQN on a popular video game Breakout. Finally, you'll find out when to use a policy-based method to tackle an RL problem.

By the end of The Reinforcement Learning Workshop, you'll be equipped with the knowledge and skills needed to solve challenging problems using reinforcement learning.

What you will learn

  • Use OpenAI Gym as a framework to implement RL environments
  • Find out how to define and implement reward function
  • Explore Markov chain, Markov decision process, and the Bellman equation
  • Distinguish between Dynamic Programming, Monte Carlo, and Temporal Difference Learning
  • Understand the multi-armed bandit problem and explore various strategies to solve it
  • Build a deep Q model network for playing the video game Breakout

Who this book is for

If you are a data scientist, machine learning enthusiast, or a Python developer who wants to learn basic to advanced deep reinforcement learning algorithms, this workshop is for you. A basic understanding of the Python language is necessary.

Table of contents

  1. The Reinforcement Learning Workshop
  2. Preface
    1. About the Book
      1. Audience
      2. About the Chapters
      3. Conventions
      4. Code Presentation
      5. Setting up Your Environment
      6. Installing Anaconda for Jupyter Notebook
      7. Installing a Virtual Environment
      8. Installing Gym
      9. Installing TensorFlow 2
      10. Installing PyTorch
      11. Installing OpenAI Baselines
      12. Installing Pillow
      13. Installing Torch
      14. Installing Other Libraries
      15. Accessing the Code Files
  3. 1. Introduction to Reinforcement Learning
    1. Introduction
    2. Learning Paradigms
      1. Introduction to Learning Paradigms
      2. Supervised versus Unsupervised versus RL
      3. Classifying Common Problems into Learning Scenarios
        1. Predicting Whether an Image Contains a Dog or a Cat
        2. Detecting and Classifying All Dogs and Cats in an Image
        3. Playing Chess
    3. Fundamentals of Reinforcement Learning
      1. Elements of RL
        1. Agent
        2. Actions
        3. Environment
        4. Policy
        5. An Example of an Autonomous Driving Environment
      2. Exercise 1.01: Implementing a Toy Environment Using Python
      3. The Agent-Environment Interface
        1. What's the Agent? What's in the Environment?
      4. Environment Types
        1. Finite versus Continuous
        2. Deterministic versus Stochastic
        3. Fully Observable versus Partially Observable
        4. POMDP versus MDP
        5. Single Agents versus Multiple Agents
      5. An Action and Its Types
      6. Policy
        1. Stochastic Policies
        2. Policy Parameterizations
      7. Exercise 1.02: Implementing a Linear Policy
      8. Goals and Rewards
        1. Why Discount?
    4. Reinforcement Learning Frameworks
      1. OpenAI Gym
        1. Getting Started with Gym – CartPole
        2. Gym Spaces
      2. Exercise 1.03: Creating a Space for Image Observations
        1. Rendering an Environment
        2. Rendering CartPole
        3. A Reinforcement Learning Loop with Gym
      3. Exercise 1.04: Implementing the Reinforcement Learning Loop with Gym
      4. Activity 1.01: Measuring the Performance of a Random Agent
      5. OpenAI Baselines
        1. Getting Started with Baselines – DQN on CartPole
    5. Applications of Reinforcement Learning
      1. Games
      2. Go
      3. Dota 2
        1. StarCraft
      4. Robot Control
      5. Autonomous Driving
    6. Summary
  4. 2. Markov Decision Processes and Bellman Equations
    1. Introduction
    2. Markov Processes
      1. The Markov Property
      2. Markov Chains
      3. Markov Reward Processes
        1. Value Functions and Bellman Equations for MRPs
        2. Solving Linear Systems of an Equation Using SciPy
      4. Exercise 2.01: Finding the Value Function in an MRP
      5. Markov Decision Processes
        1. The State-Value Function and the Action-Value Function
        2. Bellman Optimality Equation
        3. Solving the Bellman Optimality Equation
      6. Solving MDPs
        1. Algorithm Categorization
        2. Value-Based Algorithms
        3. Policy Search Algorithms
        4. Linear Programming
      7. Exercise 2.02: Determining the Best Policy for an MDP Using Linear Programming
      8. Gridworld
      9. Activity 2.01: Solving Gridworld
      10. Summary
  5. 3. Deep Learning in Practice with TensorFlow 2
    1. Introduction
    2. An Introduction to TensorFlow and Keras
      1. TensorFlow
      2. Keras
      3. Exercise 3.01: Building a Sequential Model with the Keras High-Level API
    3. How to Implement a Neural Network Using TensorFlow
      1. Model Creation
      2. Model Training
      3. Loss Function Definition
      4. Optimizer Choice
      5. Learning Rate Scheduling
      6. Feature Normalization
      7. Model Validation
      8. Performance Metrics
      9. Model Improvement
        1. Overfitting
        2. Regularization
        3. Early Stopping
        4. Dropout
        5. Data Augmentation
        6. Batch Normalization
        7. Model Testing and Inference
      10. Standard Fully Connected Neural Networks
      11. Exercise 3.02: Building a Fully Connected Neural Network Model with the Keras High-Level API
      12. Convolutional Neural Networks
      13. Exercise 3.03: Building a Convolutional Neural Network Model with the Keras High-Level API
      14. Recurrent Neural Networks
      15. Exercise 3.04: Building a Recurrent Neural Network Model with the Keras High-Level API
    4. Simple Regression Using TensorFlow
      1. Exercise 3.05: Creating a Deep Neural Network to Predict the Fuel Efficiency of Cars
    5. Simple Classification Using TensorFlow
      1. Exercise 3.06: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for Higgs Boson
    6. TensorBoard – How to Visualize Data Using TensorBoard
      1. Exercise 3.07: Creating a Deep Neural Network to Classify Events Generated by the ATLAS Experiment in the Quest for the Higgs Boson Using TensorBoard for Visualization
      2. Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
    7. Summary
  6. 4. Getting Started with OpenAI and TensorFlow for Reinforcement Learning
    1. Introduction
    2. OpenAI Gym
      1. How to Interact with a Gym Environment
      2. Exercise 4.01: Interacting with the Gym Environment
      3. Action and Observation Spaces
      4. How to Implement a Custom Gym Environment
    3. OpenAI Universe – Complex Environment
      1. OpenAI Universe Infrastructure
      2. Environments
        1. Atari Games
        2. Flash Games
        3. Browser Tasks
      3. Running an OpenAI Universe Environment
      4. Validating the Universe Infrastructure
    4. TensorFlow for Reinforcement Learning
      1. Implementing a Policy Network Using TensorFlow
      2. Exercise 4.02: Building a Policy Network with TensorFlow
      3. Exercise 4.03: Feeding the Policy Network with Environment State Representation
      4. How to Save a Policy Network
    5. OpenAI Baselines
      1. Proximal Policy Optimization
      2. Command-Line Usage
      3. Methods in OpenAI Baselines
      4. Custom Policy Network Architecture
    6. Training an RL Agent to Solve a Classic Control Problem
      1. Exercise 4.04: Solving a CartPole Environment with the PPO Algorithm
      2. Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
    7. Summary
  7. 5. Dynamic Programming
    1. Introduction
    2. Solving Dynamic Programming Problems
      1. Memoization
      2. The Tabular Method
      3. Exercise 5.01: Memoization in Practice
      4. Exercise 5.02: The Tabular Method in Practice
    3. Identifying Dynamic Programming Problems
      1. Optimal Substructures
      2. Overlapping Subproblems
      3. The Coin-Change Problem
      4. Exercise 5.03: Solving the Coin-Change Problem
    4. Dynamic Programming in RL
      1. Policy and Value Iteration
      2. State-Value Functions
      3. Action-Value Functions
      4. OpenAI Gym: Taxi-v3 Environment
        1. Policy Iteration
        2. Value Iteration
      5. The FrozenLake-v0 Environment
      6. Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
    5. Summary
  8. 6. Monte Carlo Methods
    1. Introduction
    2. The Workings of Monte Carlo Methods
    3. Understanding Monte Carlo with Blackjack
      1. Exercise 6.01: Implementing Monte Carlo in Blackjack
    4. Types of Monte Carlo Methods
      1. First Visit Monte Carlo Prediction for Estimating the Value Function
      2. Exercise 6.02: First Visit Monte Carlo Prediction for Estimating the Value Function in Blackjack
      3. Every Visit Monte Carlo Prediction for Estimating the Value Function
      4. Exercise 6.03: Every Visit Monte Carlo Prediction for Estimating the Value Function
    5. Exploration versus Exploitation Trade-Off
    6. Importance Sampling
      1. The Pseudocode for Monte Carlo Off-Policy Evaluation
      2. Exercise 6.04: Importance Sampling with Monte Carlo
    7. Solving Frozen Lake Using Monte Carlo
      1. Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
      2. The Pseudocode for Every Visit Monte Carlo Control for Epsilon Soft
      3. Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
    8. Summary
  9. 7. Temporal Difference Learning
    1. Introduction to TD Learning
    2. TD(0) – SARSA and Q-Learning
      1. SARSA – On-Policy Control
      2. Exercise 7.01: Using TD(0) SARSA to Solve FrozenLake-v0 Deterministic Transitions
      3. The Stochasticity Test
      4. Exercise 7.02: Using TD(0) SARSA to Solve FrozenLake-v0 Stochastic Transitions
      5. Q-Learning – Off-Policy Control
      6. Exercise 7.03: Using TD(0) Q-Learning to Solve FrozenLake-v0 Deterministic Transitions
      7. Expected SARSA
    3. N-Step TD and TD(λ) Algorithms
      1. N-Step TD
        1. N-step SARSA
        2. N-Step Off-Policy Learning
      2. TD(λ)
        1. SARSA(λ)
      3. Exercise 7.04: Using TD(λ) SARSA to Solve FrozenLake-v0 Deterministic Transitions
      4. Exercise 7.05: Using TD(λ) SARSA to Solve FrozenLake-v0 Stochastic Transitions
    4. The Relationship between DP, Monte-Carlo, and TD Learning
      1. Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
    5. Summary
  10. 8. The Multi-Armed Bandit Problem
    1. Introduction
    2. Formulation of the MAB Problem
      1. Applications of the MAB Problem
      2. Background and Terminology
      3. MAB Reward Distributions
    3. The Python Interface
    4. The Greedy Algorithm
      1. Implementing the Greedy Algorithm
    5. The Explore-then-Commit Algorithm
    6. The ε-Greedy Algorithm
      1. Exercise 8.01 Implementing the ε-Greedy Algorithm
      2. The Softmax Algorithm
    7. The UCB algorithm
      1. Optimism in the Face of Uncertainty
      2. Other Properties of UCB
      3. Exercise 8.02 Implementing the UCB Algorithm
    8. Thompson Sampling
      1. Introduction to Bayesian Probability
      2. The Thompson Sampling Algorithm
      3. Exercise 8.03: Implementing the Thompson Sampling Algorithm
    9. Contextual Bandits
      1. Context That Defines a Bandit Problem
      2. Queueing Bandits
      3. Working with the Queueing API
      4. Activity 8.01: Queueing Bandits
    10. Summary
  11. 9. What Is Deep Q-Learning?
    1. Introduction
    2. Basics of Deep Learning
    3. Basics of PyTorch
      1. Exercise 9.01: Building a Simple Deep Learning Model in PyTorch
      2. PyTorch Utilities
        1. The view Function
        2. The squeeze Function
        3. The unsqueeze Function
        4. The max Function
        5. The gather Function
      3. The State-Value Function and the Bellman Equation
        1. Expected Value
        2. The Value Function
        3. The Value Function for a Deterministic Environment
        4. The Value Function for a Stochastic Environment:
    4. The Action-Value Function (Q Value Function)
      1. Implementing Q Learning to Find Optimal Actions
        1. Advantages of Q Learning
      2. OpenAI Gym Review
      3. Exercise 9.02: Implementing the Q Learning Tabular Method
    5. Deep Q Learning
      1. Exercise 9.03: Implementing a Working DQN Network with PyTorch in a CartPole-v0 Environment
    6. Challenges in DQN
      1. Correlation between Steps and the Convergence Issue
      2. Experience Replay
      3. The Challenge of a Non-Stationary Target
      4. The Concept of a Target Network
      5. Exercise 9.04: Implementing a Working DQN Network with Experience Replay and a Target Network in PyTorch
      6. The Challenge of Overestimation in a DQN
      7. Double Deep Q Network (DDQN)
      8. Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
    7. Summary
  12. 10. Playing an Atari Game with Deep Recurrent Q-Networks
    1. Introduction
    2. Understanding the Breakout Environment
      1. Exercise 10.01: Playing Breakout with a Random Agent
    3. CNNs in TensorFlow
      1. Exercise 10.02: Designing a CNN Model with TensorFlow
    4. Combining a DQN with a CNN
      1. Activity 10.01: Training a DQN with CNNs to Play Breakout
    5. RNNs in TensorFlow
      1. Exercise 10.03: Designing a Combination of CNN and RNN Models with TensorFlow
    6. Building a DRQN
      1. Activity 10.02: Training a DRQN to Play Breakout
    7. Introduction to the Attention Mechanism and DARQN
      1. Activity 10.03: Training a DARQN to Play Breakout
    8. Summary
  13. 11. Policy-Based Methods for Reinforcement Learning
    1. Introduction
      1. Introduction to Value-Based and Model-Based RL
      2. Introduction to Actor-Critic Model
    2. Policy Gradients
      1. Exercise 11.01: Landing a Spacecraft on the Lunar Surface Using Policy Gradients and the Actor-Critic Method
    3. Deep Deterministic Policy Gradients
      1. Ornstein-Uhlenbeck Noise
      2. The ReplayBuffer Class
      3. The Actor-Critic Model
      4. Exercise 11.02: Creating a Learning Agent
      5. Activity 11.01: Creating an Agent That Learns a Model Using DDPG
    4. Improving Policy Gradients
      1. Trust Region Policy Optimization
      2. Proximal Policy Optimization
      3. Exercise 11.03: Improving the Lunar Lander Example Using PPO
      4. The Advantage Actor-Critic Method
      5. Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
    5. Summary
  14. 12. Evolutionary Strategies for RL
    1. Introduction
    2. Problems with Gradient-Based Methods
      1. Exercise 12.01: Optimization Using Stochastic Gradient Descent
    3. Introduction to Genetic Algorithms
      1. Exercise 12.02: Implementing Fixed-Value and Uniform Distribution Optimization Using GAs
      2. Components: Population Creation
      3. Exercise 12.03: Population Creation
      4. Components: Parent Selection
      5. Exercise 12.04: Implementing the Tournament and Roulette Wheel Techniques
      6. Components: Crossover Application
      7. Exercise 12.05: Crossover for a New Generation
      8. Components: Population Mutation
      9. Exercise 12.06: New Generation Development Using Mutation
      10. Application to Hyperparameter Selection
      11. Exercise 12.07: Implementing GA Hyperparameter Optimization for RNN Training
      12. NEAT and Other Formulations
      13. Exercise 12.08: XNOR Gate Functionality Using NEAT
      14. Activity 12.01: Cart-Pole Activity
    4. Summary
  15. Appendix
    1. 1. Introduction to Reinforcement Learning
      1. Activity 1.01: Measuring the Performance of a Random Agent
    2. 2. Markov Decision Processes and Bellman Equations
      1. Activity 2.01: Solving Gridworld
    3. 3. Deep Learning in Practice with TensorFlow 2
      1. Activity 3.01: Classifying Fashion Clothes Using a TensorFlow Dataset and TensorFlow 2
    4. 4. Getting started with OpenAI and TensorFlow for Reinforcement Learning
      1. Activity 4.01: Training a Reinforcement Learning Agent to Play a Classic Video Game
    5. 5. Dynamic Programming
      1. Activity 5.01: Implementing Policy and Value Iteration on the FrozenLake-v0 Environment
    6. 6. Monte Carlo Methods
      1. Activity 6.01: Exploring the Frozen Lake Problem – the Reward Function
      2. Activity 6.02 Solving Frozen Lake Using Monte Carlo Control Every Visit Epsilon Soft
    7. 7. Temporal Difference Learning
      1. Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
    8. 8. The Multi-Armed Bandit Problem
      1. Activity 8.01: Queueing Bandits
    9. 9. What Is Deep Q-Learning?
      1. Activity 9.01: Implementing a Double Deep Q Network in PyTorch for the CartPole Environment
    10. 10. Playing an Atari Game with Deep Recurrent Q-Networks
      1. Activity 10.01: Training a DQN with CNNs to Play Breakout
      2. Activity 10.02: Training a DRQN to Play Breakout
      3. Activity 10.03: Training a DARQN to Play Breakout
    11. 11. Policy-Based Methods for Reinforcement Learning
      1. Activity 11.01: Creating an Agent That Learns a Model Using DDPG
      2. Activity 11.02: Loading the Saved Policy to Run the Lunar Lander Simulation
    12. 12. Evolutionary Strategies for RL
      1. Activity 12.01: Cart-Pole Activity

Product information

  • Title: The Reinforcement Learning Workshop
  • Author(s): Alessandro Palmas, Emanuele Ghelfi, Dr. Alexandra Galina Petre, Mayur Kulkarni, Anand N.S., Quan Nguyen, Aritra Sen, Anthony So, Saikat Basak
  • Release date: August 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781800200456