Mastering Reinforcement Learning with Python

Book description

Get hands-on experience in creating state-of-the-art reinforcement learning agents using TensorFlow and RLlib to solve complex real-world business and industry problems with the help of expert tips and best practices

Key Features

  • Understand how large-scale state-of-the-art RL algorithms and approaches work
  • Apply RL to solve complex problems in marketing, robotics, supply chain, finance, cybersecurity, and more
  • Explore tips and best practices from experts that will enable you to overcome real-world RL challenges

Book Description

Reinforcement learning (RL) is a field of artificial intelligence (AI) used for creating self-learning autonomous agents. Building on a strong theoretical foundation, this book takes a practical approach and uses examples inspired by real-world industry problems to teach you about state-of-the-art RL.

Starting with bandit problems, Markov decision processes, and dynamic programming, the book provides an in-depth review of the classical RL techniques, such as Monte Carlo methods and temporal-difference learning. After that, you will learn about deep Q-learning, policy gradient algorithms, actor-critic methods, model-based methods, and multi-agent reinforcement learning. Then, you'll be introduced to some of the key approaches behind the most successful RL implementations, such as domain randomization and curiosity-driven learning.

As you advance, you'll explore many novel algorithms with advanced implementations using modern Python libraries such as TensorFlow and Ray's RLlib package. You'll also find out how to implement RL in areas such as robotics, supply chain management, marketing, finance, smart cities, and cybersecurity while assessing the trade-offs between different approaches and avoiding common pitfalls.

By the end of this book, you'll have mastered how to train and deploy your own RL agents for solving RL problems.

What you will learn

  • Model and solve complex sequential decision-making problems using RL
  • Develop a solid understanding of how state-of-the-art RL methods work
  • Use Python and TensorFlow to code RL algorithms from scratch
  • Parallelize and scale up your RL implementations using Ray's RLlib package
  • Get in-depth knowledge of a wide variety of RL topics
  • Understand the trade-offs between different RL approaches
  • Discover and address the challenges of implementing RL in the real world

Who this book is for

This book is for expert machine learning practitioners and researchers looking to focus on hands-on reinforcement learning with Python by implementing advanced deep reinforcement learning concepts in real-world projects. Reinforcement learning experts who want to advance their knowledge to tackle large-scale and complex sequential decision-making problems will also find this book useful. Working knowledge of Python programming and deep learning along with prior experience in reinforcement learning is required.

Table of contents

  1. Mastering Reinforcement Learning with Python
  2. Why subscribe?
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  8. Section 1: Reinforcement Learning Foundations
  9. Chapter 1: Introduction to Reinforcement Learning
    1. Why reinforcement learning?
    2. The three paradigms of ML
      1. Supervised learning
      2. Unsupervised learning
      3. Reinforcement learning
    3. RL application areas and success stories
      1. Games
      2. Robotics and autonomous systems
      3. Supply chain
      4. Manufacturing
      5. Personalization and recommender systems
      6. Smart cities
    4. Elements of a RL problem
      1. RL concepts
      2. Casting Tic-Tac-Toe as a RL problem
    5. Setting up your RL environment
      1. Hardware requirements
      2. Operating system
      3. Software toolbox
    6. Summary
    7. References
  10. Chapter 2: Multi-Armed Bandits
    1. Exploration-Exploitation Trade-Off
    2. What is a MAB?
      1. Problem definition
      2. Experimenting with a simple MAB problem
    3. Case study: Online advertising
    4. A/B/n testing
      1. Notation
      2. Application to the online advertising scenario
      3. Advantages and Disadvantages of A/B/n Testing
    5. ε-greedy actions
      1. Application to the online advertising scenario
      2. Advantages and disadvantages of ε-greedy actions
    6. Action selection using upper confidence bounds
      1. Application to the online advertising scenario
      2. Advantages and disadvantages of using UCBs
    7. Thompson (Posterior) sampling
      1. Application to the online advertising scenario
      2. Advantages and disadvantages of Thompson sampling
    8. Summary
    9. References
  11. Chapter 3: Contextual Bandits
    1. Why we need function approximations
    2. Using function approximation for context
      1. Case study: Contextual online advertising with synthetic user data
      2. Function approximation with regularized logistic regression
      3. Objective: Regret minimization
      4. Solving the online advertising problem
    3. Using function approximation for action
      1. Case study: Contextual online advertising with user data from U.S. Census
      2. Function approximation using a neural network
      3. Calculating the regret
      4. Solving the online advertising problem
    4. Other applications of multi-armed and contextual bandits
      1. Recommender systems
      2. Webpage / app feature design
      3. Healthcare
      4. Dynamic pricing
      5. Finance
      6. Control systems tuning
    5. Summary
    6. References
  12. Chapter 4: Makings of a Markov Decision Process
    1. Starting with Markov chains
      1. Stochastic processes with Markov property
      2. Classification of states in a Markov chain
      3. Example: -step behavior in the grid world
      4. Example: Sample path in an ergodic Markov chain
      5. Semi-Markov processes and continuous-time Markov chains
    2. Introducing the reward: Markov reward process
      1. Attaching rewards to the grid world example
      2. Relations between average rewards with different initializations
      3. Return, discount and state values
      4. Analytically Calculating the State Values
      5. Estimating the state values iteratively
    3. Bringing the action in: Markov decision process
      1. Definition
      2. Grid world as a Markov decision process
      3. State-value function
      4. Action-value function
      5. Optimal state-value and action-value functions
      6. Bellman optimality
    4. Partially observable Markov decision process
    5. Summary
    6. Exercises
    7. References
  13. Chapter 5: Solving the Reinforcement Learning Problem
    1. Exploring dynamic programming
      1. Example use case: Inventory replenishment of a food truck
      2. Policy evaluation
      3. Policy iteration
      4. Value iteration
      5. Drawbacks of dynamic programming
    2. Training your agent with Monte Carlo methods
      1. Monte Carlo prediction
      2. Monte Carlo control
    3. Temporal-difference learning
      1. One-step TD learning: TD(0)
      2. n-step TD Learning
    4. Understanding the importance of the simulation in reinforcement learning
    5. Summary
    6. References
  14. Section 2: Deep Reinforcement Learning
  15. Chapter 6: Deep Q-Learning at Scale
    1. From tabular Q-learning to deep Q-learning
      1. Neural Fitted Q-iteration
      2. Online Q-learning
    2. Deep Q-networks
      1. Key concepts in deep Q-networks
      2. The DQN algorithm
    3. Extensions to DQN: Rainbow
      1. The extensions
      2. The performance of the integrated agent
      3. How to choose which extensions to use: Ablations to Rainbow
      4. Overcoming the deadly triad
    4. Distributed deep Q-learning
      1. Components of a distributed deep Q-learning architecture
      2. Gorila: General reinforcement learning architecture
      3. Ape-X: Distributed prioritized experience replay
    5. Implementing scalable deep Q-learning algorithms using Ray
      1. A primer on Ray
      2. Ray implementation of a DQN variate
    6. RLlib: Production-grade deep reinforcement learning
    7. Summary
    8. References
  16. Chapter 7: Policy-Based Methods
    1. Need for policy-based methods
      1. A more principled approach
      2. Ability to use with continuous action spaces
      3. Ability to learn truly random stochastic policies
    2. Vanilla policy gradient
      1. Objective in the policy gradient methods
      2. Figuring out the gradient
      3. REINFORCE
      4. The problem with REINFORCE and all policy gradient methods
      5. Vanilla policy gradient using RLlib
    3. Actor-critic methods
      1. Further reducing the variance in policy-based methods
      2. Advantage Actor-Critic: A2C
      3. Asynchronous Advantage Actor-Critic: A3C
      4. Generalized Advantage Estimators
    4. Trust-region methods
      1. Policy gradient as policy iteration
      2. TRPO: Trust Region Policy Optimization
      3. PPO: Proximal Policy Optimization
    5. Revisiting off-policy Methods
      1. DDPG: Deep Deterministic Policy Gradient
      2. TD3: Twin Delayed Deep Deterministic Policy Gradient
      3. SAC: Soft actor-critic
      4. IMPALA: Importance Weighted Actor-Learner Architecture
    6. Comparison of the policy-based methods in Lunar Lander
    7. How to pick the right algorithm?
    8. Open source implementations of policy-gradient methods
    9. Summary
    10. References
  17. Chapter 8: Model-Based Methods
    1. Introducing model-based methods
    2. Planning through a model
      1. Defining the optimal control problem
      2. Random shooting
      3. Cross-entropy method
      4. Covariance matrix adaptation evolution strategy
      5. Monte Carlo tree search
    3. Learning a world model
      1. Understanding what model means
      2. Identifying when to learn a model
      3. Introducing a general procedure to learn a model
      4. Understanding and mitigating the impact of model uncertainty
      5. Learning a model from complex observations
    4. Unifying model-based and model-free approaches
      1. Refresher on Q-learning
      2. Dyna-style acceleration of model-free methods using world models
    5. Summary
    6. References
  18. Chapter 9: Multi-Agent Reinforcement Learning
    1. Introducing multi-agent reinforcement learning
      1. Collaboration and competition between MARL agents
    2. Exploring the challenges in multi-agent reinforcement learning
      1. Non-stationarity
      2. Scalability
      3. Unclear reinforcement learning objective
      4. Information sharing
    3. Training policies in multi-agent settings
      1. RLlib multi-agent environment
      2. Competitive self-play
    4. Training tic-tac-toe agents through self-play
      1. Designing the multi-agent tic-tac-toe environment
      2. Configuring the trainer
      3. Observing the results
    5. Summary
    6. References
  19. Section 3: Advanced Topics in RL
  20. Chapter 10: Introducing Machine Teaching
    1. Introduction to machine teaching
      1. Understanding the need for machine teaching
      2. Exploring the elements of machine teaching
    2. Engineering the reward function
      1. When to engineer the reward function
      2. Reward shaping
      3. Example: Reward shaping for mountain car
      4. Challenges with engineering the reward function
    3. Curriculum learning
    4. Warm starts with demonstrations
    5. Action masking
    6. Summary
    7. References
  21. Chapter 11: Achieving Generalization and Overcoming Partial Observability
    1. Focusing on generalization in reinforcement learning
      1. Generalization and overfitting in supervised learning
      2. Generalization and overfitting in reinforcement learning
      3. Connection between generalization and partial observability
      4. Achieving generalization with domain randomization
      5. Overcoming partial observability with memory
      6. Recipe for generalization
    2. Enriching agent experience via domain randomization
      1. Dimensions of randomization
      2. Curriculum learning for generalization
    3. Using memory to overcome partial observability
      1. Stacking observations
      2. Using RNNs
      3. Transformer architecture
    4. Quantifying generalization via CoinRun
      1. CoinRun environment
      2. Installing the CoinRun environment
      3. The effect of regularization and network architecture on the generalization of RL policies
      4. Network Randomization and Feature Matching
      5. Sunblaze environment
    5. Summary
    6. References
  22. Chapter 12: Meta-Reinforcement Learning
    1. Introducing meta-reinforcement learning
      1. Learning to learn
      2. Defining meta-reinforcement learning
      3. Relation to animal learning
      4. Relation to partial observability and domain randomization
    2. Meta-reinforcement learning with recurrent policies
      1. Grid world example
      2. RLlib implementation
    3. Gradient-based meta-reinforcement learning
      1. RLlib implementation
    4. Meta-reinforcement learning as partially observed reinforcement learning
    5. Challenges in meta-reinforcement learning
    6. Conclusion
    7. References
  23. Chapter 13: Exploring Advanced Topics
    1. Diving deeper into distributed reinforcement learning
      1. Scalable, efficient deep reinforcement learning: SEED RL
      2. Recurrent experience replay in distributed reinforcement learning
      3. Experimenting with SEED RL and R2D2
    2. Exploring curiosity-driven reinforcement learning
      1. Curiosity-driven learning for hard-exploration problems
      2. Challenges in curiosity-driven reinforcement learning
      3. Never Give Up
      4. Agent57 improvements
    3. Offline reinforcement learning
      1. An overview of how offline reinforcement learning works
      2. Why we need special algorithms for offline learning
      3. Why offline reinforcement learning is crucial
      4. Advantage weighted actor-critic
      5. Offline reinforcement learning benchmarks
    4. Summary
    5. References
  24. Section 4: Applications of RL
  25. Chapter 14: Solving Robot Learning
    1. Introducing PyBullet
      1. Setting up PyBullet
    2. Getting familiar with the Kuka environment
      1. Grasping a rectangle block using a Kuka robot
      2. Kuka Gym environment
    3. Developing strategies to solve the Kuka environment
      1. Parametrizing the difficulty of the problem
    4. Using curriculum learning to train the Kuka robot
      1. Customizing the environment for curriculum learning
      2. Designing the lessons in the curriculum
      3. Training the agent using a manually designed curriculum
      4. Curriculum learning using absolute learning progress
      5. Comparing the experiment results
    5. Going beyond PyBullet into autonomous driving
    6. Summary
    7. References
  26. Chapter 15: Supply Chain Management
    1. Optimizing inventory procurement decisions
      1. The need for inventory and the trade off in its management
      2. Components of an inventory optimization problem
      3. Single-step inventory optimization: The newsvendor problem
      4. Simulating multi-step inventory dynamics
      5. Developing a near-optimal benchmark policy
      6. Reinforcement learning solution to the inventory management
    2. Modeling routing problems
      1. Pick-up and delivery of online meal orders
      2. Pointer networks for dynamic combinatorial optimization
    3. Summary
    4. References
  27. Chapter 16: Personalization, Marketing, and Finance
    1. Going beyond bandits for personalization
      1. Shortcomings of bandit models
      2. Deep reinforcement learning for news recommendation
    2. Developing effective marketing strategies using reinforcement learning
      1. Personalized marketing content
      2. Marketing resource allocation for customer acquisition
      3. Reducing customer churn rate
      4. Winning back lost customers
    3. Applying reinforcement learning in finance
      1. Challenges with using reinforcement learning in finance
      2. Introducing TensorTrade
      3. Developing equity trading strategies
    4. Summary
    5. References
  28. Chapter 17: Smart City and Cybersecurity
    1. Controlling traffic lights to optimize vehicle flow
      1. Introducing Flow
      2. Creating an experiment in Flow
      3. Modeling the traffic light control problem
      4. Solving the traffic control problem using RLlib
      5. Further reading
    2. Providing ancillary service to power grid
      1. Power grid operations and ancillary services
      2. Describing the environment and the decision-making problem
      3. Reinforcement learning model
    3. Detecting cyberattacks in a smart grid
      1. The problem of early detection of cyberattacks in a power grid
      2. Partial observability of the grid state
    4. Summary
    5. References
  29. Chapter 18: Challenges and Future Directions in Reinforcement Learning
    1. What you have achieved with this book
    2. Challenges and future directions
      1. Sample efficiency
      2. Need for high-fidelity and fast simulation models
      3. High-dimensional action spaces
      4. Reward function fidelity
      5. Safety, behavior guarantees, and explainability
      6. Reproducibility and sensitivity to hyper-parameter choices
      7. Robustness and adversarial agents
    3. Suggestions for aspiring reinforcement learning experts
      1. Go deeper into the theory
      2. Follow good practitioners and research labs
      3. Learn from papers and from their good explanations
      4. Stay up to date with trends in other fields of deep learning
      5. Read open source repositories
      6. Practice!
    4. Final words
    5. References
  30. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Mastering Reinforcement Learning with Python
  • Author(s): Enes Bilgin
  • Release date: December 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781838644147