O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Deep Reinforcement Learning Hands-On

Book Description

This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems.

About This Book
  • Explore deep reinforcement learning (RL), from the first principles to the latest algorithms
  • Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms
  • Keep up with the very latest industry developments, including AI-driven chatbots
Who This Book Is For

Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL.

What You Will Learn
  • Understand the DL context of RL and implement complex DL models
  • Learn the foundation of RL: Markov decision processes
  • Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others
  • Discover how to deal with discrete and continuous action spaces in various environments
  • Defeat Atari arcade games using the value iteration method
  • Create your own OpenAI Gym environment to train a stock trading agent
  • Teach your agent to play Connect4 using AlphaGo Zero
  • Explore the very latest deep RL research on topics including AI-driven chatbots
In Detail

Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace.

Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots.

Style and approach

Deep Reinforcement Learning Hands-On explains the art of building self-learning agents using algorithms and practical examples. Experiment with famous examples, such as Google's defeat of well-known Atari arcade games.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Deep Reinforcement Learning Hands-On
    1. Table of Contents
    2. Deep Reinforcement Learning Hands-On
      1. Why subscribe?
      2. PacktPub.com
    3. Contributors
      1. About the author
      2. About the reviewers
      3. Packt is Searching for Authors Like You
    4. Preface
      1. Who this book is for
      2. What this book covers
      3. To get the most out of this book
        1. Download the example code files
        2. Download the color images
        3. Conventions used
      4. Get in touch
        1. Reviews
    5. 1. What is Reinforcement Learning?
      1. Learning – supervised, unsupervised, and reinforcement
      2. RL formalisms and relations
        1. Reward
        2. The agent
        3. The environment
        4. Actions
        5. Observations
      3. Markov decision processes
        1. Markov process
        2. Markov reward process
        3. Markov decision process
      4. Summary
    6. 2. OpenAI Gym
      1. The anatomy of the agent
      2. Hardware and software requirements
      3. OpenAI Gym API
        1. Action space
        2. Observation space
        3. The environment
        4. Creation of the environment
        5. The CartPole session
      4. The random CartPole agent
      5. The extra Gym functionality – wrappers and monitors
        1. Wrappers
        2. Monitor
      6. Summary
    7. 3. Deep Learning with PyTorch
      1. Tensors
        1. Creation of tensors
        2. Scalar tensors
        3. Tensor operations
        4. GPU tensors
      2. Gradients
        1. Tensors and gradients
      3. NN building blocks
      4. Custom layers
      5. Final glue – loss functions and optimizers
        1. Loss functions
        2. Optimizers
      6. Monitoring with TensorBoard
        1. TensorBoard 101
        2. Plotting stuff
      7. Example – GAN on Atari images
      8. Summary
    8. 4. The Cross-Entropy Method
      1. Taxonomy of RL methods
      2. Practical cross-entropy
      3. Cross-entropy on CartPole
      4. Cross-entropy on FrozenLake
      5. Theoretical background of the cross-entropy method
      6. Summary
    9. 5. Tabular Learning and the Bellman Equation
      1. Value, state, and optimality
      2. The Bellman equation of optimality
      3. Value of action
      4. The value iteration method
      5. Value iteration in practice
      6. Q-learning for FrozenLake
      7. Summary
    10. 6. Deep Q-Networks
      1. Real-life value iteration
      2. Tabular Q-learning
      3. Deep Q-learning
        1. Interaction with the environment
        2. SGD optimization
        3. Correlation between steps
        4. The Markov property
        5. The final form of DQN training
      4. DQN on Pong
        1. Wrappers
        2. DQN model
        3. Training
        4. Running and performance
        5. Your model in action
      5. Summary
    11. 7. DQN Extensions
      1. The PyTorch Agent Net library
        1. Agent
        2. Agent's experience
        3. Experience buffer
        4. Gym env wrappers
      2. Basic DQN
      3. N-step DQN
        1. Implementation
      4. Double DQN
        1. Implementation
        2. Results
      5. Noisy networks
        1. Implementation
        2. Results
      6. Prioritized replay buffer
        1. Implementation
        2. Results
      7. Dueling DQN
        1. Implementation
        2. Results
      8. Categorical DQN
        1. Implementation
        2. Results
      9. Combining everything
        1. Implementation
        2. Results
      10. Summary
      11. References
    12. 8. Stocks Trading Using RL
      1. Trading
      2. Data
      3. Problem statements and key decisions
      4. The trading environment
      5. Models
      6. Training code
      7. Results
        1. The feed-forward model
        2. The convolution model
      8. Things to try
      9. Summary
    13. 9. Policy Gradients – An Alternative
      1. Values and policy
        1. Why policy?
        2. Policy representation
        3. Policy gradients
      2. The REINFORCE method
        1. The CartPole example
        2. Results
        3. Policy-based versus value-based methods
      3. REINFORCE issues
        1. Full episodes are required
        2. High gradients variance
        3. Exploration
        4. Correlation between samples
      4. PG on CartPole
        1. Results
      5. PG on Pong
        1. Results
      6. Summary
    14. 10. The Actor-Critic Method
      1. Variance reduction
      2. CartPole variance
      3. Actor-critic
      4. A2C on Pong
      5. A2C on Pong results
      6. Tuning hyperparameters
        1. Learning rate
        2. Entropy beta
        3. Count of environments
        4. Batch size
      7. Summary
    15. 11. Asynchronous Advantage Actor-Critic
      1. Correlation and sample efficiency
      2. Adding an extra A to A2C
      3. Multiprocessing in Python
      4. A3C – data parallelism
        1. Results
      5. A3C – gradients parallelism
        1. Results
      6. Summary
    16. 12. Chatbots Training with RL
      1. Chatbots overview
      2. Deep NLP basics
        1. Recurrent Neural Networks
        2. Embeddings
        3. Encoder-Decoder
      3. Training of seq2seq
        1. Log-likelihood training
        2. Bilingual evaluation understudy (BLEU) score
        3. RL in seq2seq
        4. Self-critical sequence training
      4. The chatbot example
        1. The example structure
        2. Modules: cornell.py and data.py
        3. BLEU score and utils.py
        4. Model
        5. Training: cross-entropy
        6. Running the training
        7. Checking the data
        8. Testing the trained model
        9. Training: SCST
        10. Running the SCST training
        11. Results
        12. Telegram bot
      5. Summary
    17. 13. Web Navigation
      1. Web navigation
        1. Browser automation and RL
        2. Mini World of Bits benchmark
      2. OpenAI Universe
        1. Installation
        2. Actions and observations
        3. Environment creation
        4. MiniWoB stability
      3. Simple clicking approach
        1. Grid actions
        2. Example overview
        3. Model
        4. Training code
        5. Starting containers
        6. Training process
        7. Checking the learned policy
        8. Issues with simple clicking
      4. Human demonstrations
        1. Recording the demonstrations
        2. Recording format
        3. Training using demonstrations
        4. Results
        5. TicTacToe problem
      5. Adding text description
        1. Results
      6. Things to try
      7. Summary
    18. 14. Continuous Action Space
      1. Why a continuous space?
      2. Action space
      3. Environments
      4. The Actor-Critic (A2C) method
        1. Implementation
        2. Results
        3. Using models and recording videos
      5. Deterministic policy gradients
        1. Exploration
        2. Implementation
        3. Results
        4. Recording videos
      6. Distributional policy gradients
        1. Architecture
        2. Implementation
        3. Results
      7. Things to try
      8. Summary
    19. 15. Trust Regions – TRPO, PPO, and ACKTR
      1. Introduction
      2. Roboschool
      3. A2C baseline
        1. Results
        2. Videos recording
      4. Proximal Policy Optimization
        1. Implementation
        2. Results
      5. Trust Region Policy Optimization
        1. Implementation
        2. Results
      6. A2C using ACKTR
        1. Implementation
        2. Results
      7. Summary
    20. 16. Black-Box Optimization in RL
      1. Black-box methods
      2. Evolution strategies
      3. ES on CartPole
        1. Results
      4. ES on HalfCheetah
        1. Results
      5. Genetic algorithms
      6. GA on CartPole
        1. Results
      7. GA tweaks
        1. Deep GA
        2. Novelty search
      8. GA on Cheetah
        1. Results
      9. Summary
      10. References
    21. 17. Beyond Model-Free – Imagination
      1. Model-based versus model-free
      2. Model imperfections
      3. Imagination-augmented agent
        1. The environment model
        2. The rollout policy
        3. The rollout encoder
        4. Paper results
      4. I2A on Atari Breakout
        1. The baseline A2C agent
        2. EM training
        3. The imagination agent
          1. The I2A model
          2. The Rollout encoder
          3. Training of I2A
      5. Experiment results
        1. The baseline agent
        2. Training EM weights
        3. Training with the I2A model
      6. Summary
      7. References
    22. 18. AlphaGo Zero
      1. Board games
      2. The AlphaGo Zero method
        1. Overview
        2. Monte-Carlo Tree Search
        3. Self-play
        4. Training and evaluation
      3. Connect4 bot
        1. Game model
        2. Implementing MCTS
        3. Model
        4. Training
        5. Testing and comparison
      4. Connect4 results
      5. Summary
      6. References
      7. Book summary
    23. Other Books You May Enjoy
      1. Leave a review - let other readers know what you think
    24. Index