O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Foundations of Deep Reinforcement Learning: Theory and Practice in Python

Book Description

The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice

Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and DotA 2—to robotics.

Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work.

  • Understand each key aspect of a deep RL problem
  • Explore policy- and value-based algorithms, including REINFORCE, SARSA, DQN, Double DQN, and Prioritized Experience Replay (PER)
  • Delve into combined algorithms, including Actor-Critic and Proximal Policy Optimization (PPO)
  • Understand how algorithms can be parallelized synchronously and asynchronously
  • Run algorithms in SLM Lab and learn the practical implementation details for getting deep RL to work
  • Explore algorithm benchmark results with tuned hyperparameters
  • Understand how deep RL environments are designed

This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python.

 

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Contents
  4. Preface
  5. Acknowledgements
  6. About the Authors
  7. Chapter 1. Introduction
    1. 1.1 Reinforcement Learning
    2. 1.2 Reinforcement Learning as MDP
    3. 1.3 Learnable Functions in Reinforcement Learning
    4. 1.4 Deep Reinforcement Learning Algorithms
    5. 1.5 Deep Learning for Reinforcement Learning
    6. 1.6 Reinforcement Learning and Supervised Learning
    7. 1.7 Summary
  8. Part I: Policy-based & Value-based Algorithms
    1. Chapter 2. Reinforce
      1. 2.1 Policy
      2. 2.2 The Objective Function
      3. 2.3 The Policy Gradient
      4. 2.4 Monte Carlo Sampling
      5. 2.5 REINFORCE Algorithm
      6. 2.6 Implementing REINFORCE
      7. 2.7 Training a REINFORCE Agent
      8. 2.8 Experimental Results
      9. 2.9 Summary
      10. 2.10 Further Reading
      11. 2.11 History
    2. Chapter 3. SARSA
      1. 3.1 The Q and V Functions
      2. 3.2 Temporal Difference Learning
      3. 3.3 Action Selection in SARSA
      4. 3.4 SARSA Algorithm
      5. 3.5 Implementing SARSA
      6. 3.6 Training a SARSA Agent
      7. 3.7 Experimental Results
      8. 3.8 Summary
      9. 3.9 Further Reading
      10. 3.10 History
    3. Chapter 4. Deep Q-Networks (DQN)
      1. 4.1 Learning the Q-function in DQN
      2. 4.2 Action Selection in DQN
      3. 4.3 Experience Replay
      4. 4.4 DQN Algorithm
      5. 4.5 Implementing DQN
      6. 4.6 Training a DQN Agent
      7. 4.7 Experimental Results
      8. 4.8 Summary
      9. 4.9 Further Reading
      10. 4.10 History
    4. Chapter 5. Improving DQN
      1. 5.1 Target Networks
      2. 5.2 Double DQN
      3. 5.3 Prioritized Experience Replay (PER)
      4. 5.4 Modified DQN Implementation
      5. 5.5 Training a DQN Agent to Play Atari Games
      6. 5.6 Experimental Results
      7. 5.7 Summary
      8. 5.8 Further Reading
  9. Part II: Combined methods
    1. Chapter 6. Advantage Actor-Critic (A2C)
      1. 6.1 The Actor
      2. 6.2 The Critic
      3. 6.3 A2C Algorithm
      4. 6.4 Implementing A2C
      5. 6.5 Network Architecture
      6. 6.6 Training an A2C Agent
      7. 6.7 Experimental Results
      8. 6.8 Summary
      9. 6.9 Further Reading
      10. 6.10 History
    2. Chapter 7. Proximal Policy Optimization (PPO)
      1. 7.1 Surrogate Objective
      2. 7.2 Proximal Policy Optimization (PPO)
      3. 7.3 PPO Algorithm
      4. 7.4 Implementing PPO
      5. 7.5 Training a PPO Agent
      6. 7.6 Experimental Results
      7. 7.7 Summary
      8. 7.8 Further Reading
    3. Chapter 8. Parallelization Methods
      1. 8.1 Synchronous Parallelization
      2. 8.2 Asynchronous Parallelization
      3. 8.3 Training an A3C Agent
      4. 8.4 Summary
      5. 8.5 Further Reading
    4. Chapter 9. Algorithm Summary
  10. Part III: Practical Tips
    1. Chapter 10. Getting Deep RL to Work
      1. 10.1 Software Engineering Practices
      2. 10.2 Debugging Tips
      3. 10.3 Atari Tricks
      4. 10.4 Deep RL Almanac
      5. 10.5 Summary
    2. Chapter 11. SLM Lab
      1. 11.1 Implemented Algorithms in SLM Lab
      2. 11.2 Spec File
      3. 11.3 Running SLM Lab
      4. 11.4 Analyzing Experiment Results
      5. 11.5 Summary
    3. Chapter 12. Network architectures
      1. 12.1 Types of Neural Network
      2. 12.2 Guidelines For Choosing a Network Family
      3. 12.3 The Net API
      4. 12.4 Summary
      5. 12.5 Further Reading
    4. Chapter 13. Hardware
      1. 13.1 Computer
      2. 13.2 Information In Hardware
      3. 13.3 Choosing Hardware
      4. 13.4 Summary
    5. Chapter 14. Environment Design
      1. 14.1 States
      2. 14.2 Actions
      3. 14.3 Rewards
      4. 14.4 Transition Function
      5. 14.5 Summary
      6. 14.6 Further Reading: Action Design in Everyday Things
  11. Epilogue
  12. Appendix A. Deep Reinforcement Learning Timeline
  13. Appendix B. Example Environments
    1. B.1 Discrete Environments
    2. B.2 Continuous Environments
  14. Bibliography