O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hands-On Reinforcement Learning with Python

Book Description

A hands-on guide enriched with examples to master deep reinforcement learning algorithms with Python

About This Book
  • Your entry point into the world of artificial intelligence using the power of Python
  • An example-rich guide to master various RL and DRL algorithms
  • Explore various state-of-the-art architectures along with math
Who This Book Is For

If you're a machine learning developer or deep learning enthusiast interested in artificial intelligence and want to learn about reinforcement learning from scratch, this book is for you. Some knowledge of linear algebra, calculus, and the Python programming language will help you understand the concepts covered in this book.

What You Will Learn
  • Understand the basics of reinforcement learning methods, algorithms, and elements
  • Train an agent to walk using OpenAI Gym and Tensorflow
  • Understand the Markov Decision Process, Bellman's optimality, and TD learning
  • Solve multi-armed-bandit problems using various algorithms
  • Master deep learning algorithms, such as RNN, LSTM, and CNN with applications
  • Build intelligent agents using the DRQN algorithm to play the Doom game
  • Teach agents to play the Lunar Lander game using DDPG
  • Train an agent to win a car racing game using dueling DQN
In Detail

Reinforcement Learning (RL) is the trending and most promising branch of artificial intelligence. Hands-On Reinforcement learning with Python will help you master not only the basic reinforcement learning algorithms but also the advanced deep reinforcement learning algorithms.

The book starts with an introduction to Reinforcement Learning followed by OpenAI Gym, and TensorFlow. You will then explore various RL algorithms and concepts, such as Markov Decision Process, Monte Carlo methods, and dynamic programming, including value and policy iteration. This example-rich guide will introduce you to deep reinforcement learning algorithms, such as Dueling DQN, DRQN, A3C, PPO, and TRPO. You will also learn about imagination-augmented agents, learning from human preference, DQfD, HER, and many more of the recent advancements in reinforcement learning.

By the end of the book, you will have all the knowledge and experience needed to implement reinforcement learning and deep reinforcement learning in your projects, and you will be all set to enter the world of artificial intelligence.

Style and approach

This is a hands-on book designed to further expand your machine learning skills by understanding reinforcement to deep reinforcement learning algorithms with applications in Python.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Reinforcement Learning with Python
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Introduction to Reinforcement Learning
    1. What is RL?
    2. RL algorithm
    3. How RL differs from other ML paradigms
    4. Elements of RL
      1. Agent
      2. Policy function
      3. Value function
      4. Model
    5. Agent environment interface
    6. Types of RL environment
      1. Deterministic environment
      2. Stochastic environment
      3. Fully observable environment
      4. Partially observable environment
      5. Discrete environment
      6. Continuous environment
      7. Episodic and non-episodic environment
      8. Single and multi-agent environment
    7. RL platforms
      1. OpenAI Gym and Universe
      2. DeepMind Lab
      3. RL-Glue
      4. Project Malmo
      5. ViZDoom
    8. Applications of RL
      1. Education
      2. Medicine and healthcare
      3. Manufacturing
      4. Inventory management
      5. Finance
      6. Natural Language Processing and Computer Vision
    9. Summary
    10. Questions
    11. Further reading
  8. Getting Started with OpenAI and TensorFlow
    1. Setting up your machine
      1. Installing Anaconda
      2. Installing Docker
      3. Installing OpenAI Gym and Universe
        1. Common error fixes
    2. OpenAI Gym
      1. Basic simulations
      2. Training a robot to walk
    3. OpenAI Universe
      1. Building a video game bot
    4. TensorFlow
      1. Variables, constants, and placeholders
        1. Variables
        2. Constants
        3. Placeholders
      2. Computation graph
      3. Sessions
      4. TensorBoard
        1. Adding scope
    5. Summary
    6. Questions
    7. Further reading
  9. The Markov Decision Process and Dynamic Programming
    1. The Markov chain and Markov process
    2. Markov Decision Process
      1. Rewards and returns
      2. Episodic and continuous tasks
      3. Discount factor
      4. The policy function
      5. State value function
      6. State-action value function (Q function)
    3. The Bellman equation and optimality
      1. Deriving the Bellman equation for value and Q functions
    4. Solving the Bellman equation
      1. Dynamic programming
        1. Value iteration
        2. Policy iteration
    5. Solving the frozen lake problem
      1. Value iteration
      2. Policy iteration
    6. Summary
    7. Questions
    8. Further reading
  10. Gaming with Monte Carlo Methods
    1. Monte Carlo methods
      1. Estimating the value of pi using Monte Carlo
    2. Monte Carlo prediction
      1. First visit Monte Carlo
      2. Every visit Monte Carlo
      3. Let's play Blackjack with Monte Carlo
    3. Monte Carlo control
      1. Monte Carlo exploration starts
      2. On-policy Monte Carlo control
      3. Off-policy Monte Carlo control
    4. Summary
    5. Questions
    6. Further reading
  11. Temporal Difference Learning
    1. TD learning
    2. TD prediction
    3. TD control
      1. Q learning
        1. Solving the taxi problem using Q learning
      2. SARSA
        1. Solving the taxi problem using SARSA
    4. The difference between Q learning and SARSA
    5. Summary
    6. Questions
    7. Further reading
  12. Multi-Armed Bandit Problem
    1. The MAB problem
      1. The epsilon-greedy policy
      2. The softmax exploration algorithm
      3. The upper confidence bound algorithm
      4. The Thompson sampling algorithm
    2. Applications of MAB
    3. Identifying the right advertisement banner using MAB
    4. Contextual bandits
    5. Summary
    6. Questions
    7. Further reading
  13. Deep Learning Fundamentals
    1. Artificial neurons
    2. ANNs
      1. Input layer
      2. Hidden layer
      3. Output layer
      4. Activation functions
    3. Deep diving into ANN
      1. Gradient descent
    4. Neural networks in TensorFlow
    5. RNN
      1. Backpropagation through time
    6. Long Short-Term Memory RNN
      1. Generating song lyrics using LSTM RNN
    7. Convolutional neural networks
      1. Convolutional layer
      2. Pooling layer
      3. Fully connected layer
      4. CNN architecture
    8. Classifying fashion products using CNN
    9. Summary
    10. Questions
    11. Further reading
  14. Atari Games with Deep Q Network
    1. What is a Deep Q Network?
    2. Architecture of DQN
      1. Convolutional network
      2. Experience replay
      3. Target network
      4. Clipping rewards
      5. Understanding the algorithm
    3. Building an agent to play Atari games
    4. Double DQN
    5. Prioritized experience replay
    6. Dueling network architecture
    7. Summary
    8. Questions
    9. Further reading
  15. Playing Doom with a Deep Recurrent Q Network
    1. DRQN
      1. Architecture of DRQN
    2. Training an agent to play Doom 
      1. Basic Doom game
      2. Doom with DRQN
    3. DARQN
      1. Architecture of DARQN
    4. Summary
    5. Questions
    6. Further reading
  16. The Asynchronous Advantage Actor Critic Network
    1. The Asynchronous Advantage Actor Critic
      1. The three As
      2. The architecture of A3C
      3. How A3C works
    2. Driving up a mountain with A3C
      1. Visualization in TensorBoard
    3. Summary
    4. Questions
    5. Further reading
  17. Policy Gradients and Optimization
    1. Policy gradient
      1. Lunar Lander using policy gradients
    2. Deep deterministic policy gradient
      1. Swinging a pendulum
    3. Trust Region Policy Optimization
    4. Proximal Policy Optimization
    5. Summary
    6. Questions
    7. Further reading
  18. Capstone Project – Car Racing Using DQN
    1. Environment wrapper functions
    2. Dueling network
    3. Replay memory
    4. Training the network
    5. Car racing
    6. Summary
    7. Questions
    8. Further reading
  19. Recent Advancements and Next Steps
    1. Imagination augmented agents 
    2. Learning from human preference
    3. Deep Q learning from demonstrations
    4. Hindsight experience replay
    5. Hierarchical reinforcement learning
      1. MAXQ Value Function Decomposition
    6. Inverse reinforcement learning
    7. Summary
    8. Questions
    9. Further reading
  20. Assessments
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
    11. Chapter 11
    12. Chapter 12
    13. Chapter 13
  21. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think