Reinforcement Learning Algorithms with Python

Book description

None

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Reinforcement Learning Algorithms with Python
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Section 1: Algorithms and Environments
  8. The Landscape of Reinforcement Learning
    1. An introduction to RL
      1. Comparing RL and supervised learning
      2. History of RL
      3. Deep RL
    2. Elements of RL
      1. Policy
      2. The value function
      3. Reward
      4. Model
    3. Applications of RL
      1. Games
      2. Robotics and Industry 4.0
      3. Machine learning
      4. Economics and finance
      5. Healthcare
      6. Intelligent transportation systems
      7. Energy optimization and smart grid
    4. Summary
    5. Questions
    6. Further reading
  9. Implementing RL Cycle and OpenAI Gym
    1. Setting up the environment
      1. Installing OpenAI Gym
      2. Installing Roboschool
    2. OpenAI Gym and RL cycles
      1. Developing an RL cycle
      2. Getting used to spaces
    3. Development of ML models using TensorFlow
      1. Tensor
        1. Constant
        2. Placeholder
        3. Variable
      2. Creating a graph
      3. Simple linear regression example
    4. Introducing TensorBoard
    5. Types of RL environments
      1. Why different environments?
      2. Open source environments
    6. Summary
    7. Questions
    8. Further reading
  10. Solving Problems with Dynamic Programming
    1. MDP
      1. Policy
      2. Return
      3. Value functions
      4. Bellman equation
    2. Categorizing RL algorithms
      1. Model-free algorithms
        1. Value-based algorithms
        2. Policy gradient algorithms
          1. Actor-Critic algorithms
        3. Hybrid algorithms
      2. Model-based RL
      3. Algorithm diversity
    3. Dynamic programming
      1. Policy evaluation and policy improvement
      2. Policy iteration
        1. Policy iteration applied to FrozenLake
      3. Value iteration
        1. Value iteration applied to FrozenLake
    4. Summary
    5. Questions
    6. Further reading
  11. Section 2: Model-Free RL Algorithms
  12. Q-Learning and SARSA Applications
    1. Learning without a model
      1. User experience
      2. Policy evaluation
      3. The exploration problem
        1. Why explore?
        2. How to explore
    2. TD learning
      1. TD update
      2. Policy improvement
      3. Comparing Monte Carlo and TD
    3. SARSA
      1. The algorithm
    4. Applying SARSA to Taxi-v2
    5. Q-learning
      1. Theory
      2. The algorithm
    6. Applying Q-learning to Taxi-v2
      1. Comparing SARSA and Q-learning
    7. Summary
    8. Questions
  13. Deep Q-Network
    1. Deep neural networks and Q-learning
      1. Function approximation
      2. Q-learning with neural networks
      3. Deep Q-learning instabilities
    2. DQN
      1. The solution
        1. Replay memory
        2. The target network
      2. The DQN algorithm
        1. The loss function
        2. Pseudocode
      3. Model architecture
    3. DQN applied to Pong
      1. Atari games
      2. Preprocessing
      3. DQN implementation
        1. DNNs
        2. The experienced buffer
        3. The computational graph and training loop
      4. Results
    4. DQN variations
      1. Double DQN
        1. DDQN implementation
        2. Results
      2. Dueling DQN
        1. Dueling DQN implementation
        2. Results
      3. N-step DQN
        1. Implementation
        2. Results
    5. Summary
    6. Questions
    7. Further reading
  14. Learning Stochastic and PG Optimization
    1. Policy gradient methods
      1. The gradient of the policy
      2. Policy gradient theorem
      3. Computing the gradient
      4. The policy
      5. On-policy PG
    2. Understanding the REINFORCE algorithm
      1. Implementing REINFORCE
      2. Landing a spacecraft using REINFORCE
        1. Analyzing the results
    3. REINFORCE with baseline
      1. Implementing REINFORCE with baseline
    4. Learning the AC algorithm
      1. Using a critic to help an actor to learn
      2. The n-step AC model
      3. The AC implementation
      4. Landing a spacecraft using AC
      5. Advanced AC, and tips and tricks
    5. Summary
    6. Questions
    7. Further reading
  15. TRPO and PPO Implementation
    1. Roboschool
      1. Control a continuous system
    2. Natural policy gradient
      1. Intuition behind NPG
      2. A bit of math
        1. FIM and KL divergence
      3. Natural gradient complications
    3. Trust region policy optimization
      1. The TRPO algorithm
      2. Implementation of the TRPO algorithm
      3. Application of TRPO
    4. Proximal Policy Optimization
      1. A quick overview
      2. The PPO algorithm
      3. Implementation of PPO
      4. PPO application
    5. Summary
    6. Questions
    7. Further reading
  16. DDPG and TD3 Applications
    1. Combining policy gradient optimization with Q-learning
      1. Deterministic policy gradient
    2. Deep deterministic policy gradient
      1. The DDPG algorithm
      2. DDPG implementation
      3. Appling DDPG to BipedalWalker-v2
    3. Twin delayed deep deterministic policy gradient (TD3)
      1. Addressing overestimation bias
        1. Implementation of TD3
      2. Addressing variance reduction
        1. Delayed policy updates
        2. Target regularization
      3. Applying TD3 to BipedalWalker
    4. Summary
    5. Questions
    6. Further reading
  17. Section 3: Beyond Model-Free Algorithms and Improvements
  18. Model-Based RL
    1. Model-based methods
      1. A broad perspective on model-based learning
        1. A known model
        2. Unknown model
      2. Advantages and disadvantages
    2. Combining model-based with model-free learning
      1. A useful combination
      2. Building a model from images
    3. ME-TRPO applied to an inverted pendulum
      1. Understanding ME-TRPO
      2. Implementing ME-TRPO
      3. Experimenting with RoboSchool
        1. Results on RoboSchoolInvertedPendulum
    4. Summary
    5. Questions
    6. Further reading
  19. Imitation Learning with the DAgger Algorithm
    1. Technical requirements
      1. Installation of Flappy Bird
    2. The imitation approach
      1. The driving assistant example
      2. Comparing IL and RL
      3. The role of the expert in imitation learning
      4. The IL structure
        1. Comparing active with passive imitation
    3. Playing Flappy Bird
      1. How to use the environment
    4. Understanding the dataset aggregation algorithm
      1. The DAgger algorithm
      2. Implementation of DAgger
        1. Loading the expert inference model
        2. Creating the learner's computational graph
        3. Creating a DAgger loop
      3. Analyzing the results on Flappy Bird
    5. IRL
    6. Summary
    7. Questions
    8. Further reading
  20. Understanding Black-Box Optimization Algorithms
    1. Beyond RL
      1. A brief recap of RL
      2. The alternative
        1. EAs
    2. The core of EAs
      1. Genetic algorithms
      2. Evolution strategies
        1. CMA-ES
        2. ES versus RL
    3. Scalable evolution strategies
      1. The core
        1. Parallelizing ES
        2. Other tricks
        3. Pseudocode
      2. Scalable implementation
        1. The main function
        2. Workers
    4. Applying scalable ES to LunarLander
    5. Summary
    6. Questions
    7. Further reading
  21. Developing the ESBAS Algorithm
    1. Exploration versus exploitation
      1. Multi-armed bandit
    2. Approaches to exploration
      1. The ∈-greedy strategy
      2. The UCB algorithm
        1. UCB1
      3. Exploration complexity
    3. Epochal stochastic bandit algorithm selection
      1. Unboxing algorithm selection
      2. Under the hood of ESBAS
      3. Implementation
      4. Solving Acrobot
        1. Results
    4. Summary
    5. Questions
    6. Further reading
  22. Practical Implementation for Resolving RL Challenges
    1. Best practices of deep RL
      1. Choosing the appropriate algorithm
      2. From zero to one
    2. Challenges in deep RL
      1. Stability and reproducibility
      2. Efficiency
      3. Generalization
    3. Advanced techniques
      1. Unsupervised RL
        1. Intrinsic reward
      2. Transfer learning
        1. Types of transfer learning
          1. 1-task learning
          2. Multi-task learning
    4. RL in the real world
      1. Facing real-world challenges
      2. Bridging the gap between simulation and the real world
      3. Creating your own environment
    5. Future of RL and its impact on society
    6. Summary
    7. Questions
    8. Further reading
  23. Assessments
  24. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Reinforcement Learning Algorithms with Python
  • Author(s):
  • Release date:
  • Publisher(s): Packt Publishing
  • ISBN: None