Hands-On Q-Learning with Python

Book description

Leverage the power of reward-based training for your deep learning models with Python

Key Features

  • Understand Q-learning algorithms to train neural networks using Markov Decision Process (MDP)
  • Study practical deep reinforcement learning using Q-Networks
  • Explore state-based unsupervised learning for machine learning models

Book Description

Q-learning is a machine learning algorithm used to solve optimization problems in artificial intelligence (AI). It is one of the most popular fields of study among AI researchers.

This book starts off by introducing you to reinforcement learning and Q-learning, in addition to helping you become familiar with OpenAI Gym as well as libraries such as Keras and TensorFlow. A few chapters into the book, you will gain insights into model-free Q-learning and use deep Q-networks and double deep Q-networks to solve complex problems. This book will guide you in exploring use cases such as self-driving vehicles and OpenAI Gym’s CartPole problem. You will also learn how to tune and optimize Q-networks and their hyperparameters. As you progress, you will understand the reinforcement learning approach to solving real-world problems. You will also explore how to use Q-learning and related algorithms in scientific research. Toward the end, you’ll gain insight into what’s in store for reinforcement learning.

By the end of this book, you will be equipped with the skills you need to solve reinforcement learning problems using Q-learning algorithms with OpenAI Gym, Keras, and TensorFlow.

What you will learn

  • Explore the fundamentals of reinforcement learning and the state-action-reward process
  • Understand Markov decision processes
  • Get well-versed with libraries such as Keras, and TensorFlow
  • Create and deploy model-free learning and deep Q-learning agents with TensorFlow, Keras, and OpenAI Gym
  • Choose and optimize a Q-network’s learning parameters and fine-tune its performance
  • Discover real-world applications and use cases of Q-learning

Who this book is for

If you are a machine learning developer, engineer, or professional who wants to explore the deep learning approach for a complex environment, then this is the book for you. Proficiency in Python programming and basic understanding of decision-making in reinforcement learning is assumed.

Publisher resources

Download Example Code

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Q-Learning with Python
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Q-Learning: A Roadmap
  7. Brushing Up on Reinforcement Learning Concepts
    1. What is RL? 
      1. States and actions
      2. The decision-making process 
      3. RL, supervised learning, and unsupervised learning
    2. States, actions, and rewards
      1. States
      2. Actions and rewards
      3. Bellman equations
    3. Key concepts in RL
      1. Value-based versus policy-based iteration
      2. Q-learning hyperparameters – alpha, gamma, and epsilon
      3. Alpha – deterministic versus stochastic environments
      4. Gamma – current versus future rewards
      5. Epsilon – exploration versus exploitation
      6. Decaying epsilon
    4. SARSA versus Q-learning – on-policy or off?
      1. SARSA and the cliff-walking problem
      2. When to choose SARSA over Q-learning
    5. Summary
    6. Questions
  8. Getting Started with the Q-Learning Algorithm
    1. Technical requirements
    2. Demystifying MDPs
      1. Control processes
      2. Markov chains
      3. The Markov property
      4. MDPs and state-action diagrams
      5. Solving MDPs with RL
    3. Your Q-learning agent in its environment
      1. Solving the optimization problem
      2. States and actions in Taxi-v2
    4. Fine-tuning your model – learning, discount, and exploration rates
      1. Decaying epsilon
      2. Decaying alpha 
      3. Decaying gamma
    5. MABP – a classic exploration versus exploitation problem
      1. Setting up a bandit problem
      2. Bandit optimization strategies
      3. Other applications for bandit problems
    6. Optimal versus safe paths – revisiting SARSA
    7. Summary
    8. Questions
  9. Setting Up Your First Environment with OpenAI Gym
    1. Technical requirements
    2. Getting started with OpenAI Gym
      1. What is Gym?
      2. Setting up Gym
      3. Gym environments
      4. Setting up an environment
    3. Exploring the Taxi-v2 environment
      1. The state space and valid actions
      2. Choosing an action manually
      3. Setting a state manually
    4. Creating a baseline agent
      1. Stepping through actions
      2. Creating a task loop
      3. Baseline models in Q-learning and machine learning research
    5. Summary
    6. Questions
  10. Teaching a Smartcab to Drive Using Q-Learning
    1. Technical requirements
    2. Getting to know your learning agent
    3. Implementing your agent
      1. The value function – calculating the Q-value of a state-action pair
      2. Implementing Bellman equations
    4. The learning parameters – alpha, gamma, and epsilon 
      1. Adding an updated alpha value
      2. Adding an updated epsilon value
    5. Model-tuning and tracking your agent's long-term performance
      1. Comparing your models and statistical performance measures
      2. Training your models
      3. Decaying epsilon
      4. Hyperparameter tuning
    6. Summary
    7. Questions
  11. Section 2: Building and Optimizing Q-Learning Agents
  12. Building Q-Networks with TensorFlow
    1. Technical requirements
    2. A brief overview of neural networks
      1. Extensional versus intensional definitions
    3. Taking a closer look
      1. Input, hidden, and output layers
      2. Perceptron functions
      3. ReLU functions
    4. Implementing a neural network with NumPy
      1. Feedforward
      2. Backpropagation
    5. Neural networks and Q-learning
      1. Policy agents versus value agents
    6. Building your first Q-network
      1. Defining the network
      2. Training the network
    7. Summary
    8. Questions
    9. Further reading
  13. Digging Deeper into Deep Q-Networks with Keras and TensorFlow
    1. Technical requirements
    2. Introducing CartPole-v1
      1. More about CartPole states and actions
    3. Getting started with the CartPole task
    4. Building a DQN to solve the CartPole problem
      1. Gamma
      2. Alpha
      3. Epsilon
      4. Building a DQN class
      5. Choosing actions with epsilon-greedy
      6. Updating the Q-values
      7. Running the task loop
    5. Testing and results
    6. Adding in experience replay
      1. About experience replay
      2. Implementation
      3. Experience replay results
    7. Building further on DQNs
      1. Calculating DQN loss
      2. Fixed Q-targets
      3. Double-deep Q-networks
      4. Dueling deep Q-networks
    8. Summary
    9. Questions
    10. Further reading
  14. Section 3: Advanced Q-Learning Challenges with Keras, TensorFlow, and OpenAI Gym
  15. Decoupling Exploration and Exploitation in Multi-Armed Bandits
    1. Technical requirements
    2. Probability distributions and ongoing knowledge
      1. Iterative probability distributions
    3. Revisiting a simple bandit problem
      1. A sample two-armed bandit iteration
    4. Multi-armed bandit strategy overview
      1. Greedy strategy
      2. Epsilon-greedy strategy
      3. Upper confidence bound
      4. Bandit regret
      5. Utility functions and optimal decisions
    5. Contextual bandits and state diagrams
    6. Thompson sampling and the Bayesian control rule
      1. Thompson sampling
      2. Bayesian control rule
    7. Solving a multi-armed bandit problem in Python – user advertisement clicks
      1. Epsilon-greedy selection
    8. Multi-armed bandits in experimental design
      1. The testing process
      2. Bandits with knapsacks – more multi-armed bandit applications
    9. Summary
    10. Questions
    11. Further reading
  16. Further Q-Learning Research and Future Projects
    1. Google's DeepMind and the future of Q-learning
    2. OpenAI Gym and RL research
      1. The standardization of RL research practice with Gym
      2. Tracking your scores with the Gym leaderboard
    3. More OpenAI Gym environments
      1. Pendulum
      2. Acrobot
      3. MountainCar
      4. Continuous control tasks –  MuJoCo
      5. Continuous control tasks – Box2D
      6. Robotics research and development
      7. Algorithms
      8. Toy text
    4. Contextual bandits and probability distributions
      1. Probability and intelligence
      2. Updating probability distributions
      3. State spaces
      4. A/B testing versus multi-armed bandit testing
      5. Testing methodologies
    5. Summary
    6. Questions
    7. Further reading
  17. Assessments
    1. Chapter 1, Brushing Up on Reinforcement Learning Concepts
    2. Chapter 2, Getting Started with the Q-Learning Algorithm
    3. Chapter 3, Setting Up Your First Environment with OpenAI Gym
    4. Chapter 4, Teaching a Smartcab to Drive Using Q-Learning
    5. Chapter 5, Building Q-Networks with TensorFlow
    6. Chapter 6, Digging Deeper into Deep Q-Networks with Keras and TensorFlow
    7. Chapter 7, Decoupling Exploration and Exploitation in Multi-Armed Bandits
    8. Chapter 8, Further Q-Learning Research and Future Projects
  18. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Q-Learning with Python
  • Author(s): Nazia Habib
  • Release date: April 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789345803