O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Reinforcement Learning with TensorFlow

Book Description

Leverage the power of the Reinforcement Learning techniques to develop self-learning systems using Tensorflow

About This Book
  • Learn reinforcement learning concepts and their implementation using TensorFlow
  • Discover different problem-solving methods for Reinforcement Learning
  • Apply reinforcement learning for autonomous driving cars, robobrokers, and more
Who This Book Is For

If you want to get started with reinforcement learning using TensorFlow in the most practical way, this book will be a useful resource. The book assumes prior knowledge of machine learning and neural network programming concepts, as well as some understanding of the TensorFlow framework. No previous experience with Reinforcement Learning is required.

What You Will Learn
  • Implement state-of-the-art Reinforcement Learning algorithms from the basics
  • Discover various techniques of Reinforcement Learning such as MDP, Q Learning and more
  • Learn the applications of Reinforcement Learning in advertisement, image processing, and NLP
  • Teach a Reinforcement Learning model to play a game using TensorFlow and the OpenAI gym
  • Understand how Reinforcement Learning Applications are used in robotics
In Detail

Reinforcement Learning (RL), allows you to develop smart, quick and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in Artificial Intelligence—from games, self-driving cars and robots to enterprise applications that range from datacenter energy saving (cooling data centers) to smart warehousing solutions.

The book covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. The book also introduces readers to the concept of Reinforcement Learning, its advantages and why it's gaining so much popularity. The book also discusses on MDPs, Monte Carlo tree searches, dynamic programming such as policy and value iteration, temporal difference learning such as Q-learning and SARSA. You will use TensorFlow and OpenAI Gym to build simple neural network models that learn from their own actions. You will also see how reinforcement learning algorithms play a role in games, image processing and NLP.

By the end of this book, you will have a firm understanding of what reinforcement learning is and how to put your knowledge to practical use by leveraging the power of TensorFlow and OpenAI Gym.

Style and approach

An Easy-to-follow, step-by-step guide to help you get to grips with real-world applications of Reinforcement Learning with TensorFlow.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Reinforcement Learning with TensorFlow
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Deep Learning – Architectures and Frameworks
    1. Deep learning
      1. Activation functions for deep learning
        1. The sigmoid function
        2. The tanh function
        3. The softmax function
        4. The rectified linear unit function
        5. How to choose the right activation function
      2. Logistic regression as a neural network
        1. Notation
        2. Objective
        3. The cost function
        4. The gradient descent algorithm
        5. The computational graph
        6. Steps to solve logistic regression using gradient descent
          1. What is xavier initialization?
          2. Why do we use xavier initialization?
      3. The neural network model
        1. Recurrent neural networks
        2. Long Short Term Memory Networks
        3. Convolutional neural networks
          1. The LeNet-5 convolutional neural network
          2. The AlexNet model
          3. The VGG-Net model
          4. The Inception model
      4. Limitations of deep learning
        1. The vanishing gradient problem
        2. The exploding gradient problem
        3. Overcoming the limitations of deep learning
    2. Reinforcement learning
      1. Basic terminologies and conventions
      2. Optimality criteria
        1. The value function for optimality
        2. The policy model for optimality
      3. The Q-learning approach to reinforcement learning
      4. Asynchronous advantage actor-critic
    3. Introduction to TensorFlow and OpenAI Gym
      1. Basic computations in TensorFlow
      2. An introduction to OpenAI Gym
    4. The pioneers and breakthroughs in reinforcement learning
      1. David Silver
      2. Pieter Abbeel
      3. Google DeepMind
      4. The AlphaGo program
      5. Libratus
    5. Summary
  7. Training Reinforcement Learning Agents Using OpenAI Gym
    1. The OpenAI Gym
      1. Understanding an OpenAI Gym environment
    2. Programming an agent using an OpenAI Gym environment
      1. Q-Learning
        1. The Epsilon-Greedy approach
      2. Using the Q-Network for real-world applications
    3. Summary
  8. Markov Decision Process
    1. Markov decision processes
      1. The Markov property
      2. The S state set
      3. Actions
      4. Transition model
      5. Rewards
      6. Policy
      7. The sequence of rewards - assumptions
        1. The infinite horizons
        2. Utility of sequences
      8. The Bellman equations
        1. Solving the Bellman equation to find policies
          1. An example of value iteration using the Bellman equation
          2. Policy iteration
    2. Partially observable Markov decision processes
      1. State estimation
      2. Value iteration in POMDPs
    3. Training the FrozenLake-v0 environment using MDP
    4. Summary
  9. Policy Gradients
    1. The policy optimization method
    2. Why policy optimization methods?
      1. Why stochastic policy?
        1. Example 1 - rock, paper, scissors
        2. Example 2 - state aliased grid-world
    3. Policy objective functions
      1. Policy Gradient Theorem
    4. Temporal difference rule
      1. TD(1) rule
      2. TD(0) rule
      3. TD() rule
    5. Policy gradients
      1. The Monte Carlo policy gradient
      2. Actor-critic algorithms
      3. Using a baseline to reduce variance
      4. Vanilla policy gradient
    6. Agent learning pong using policy gradients
    7. Summary
  10. Q-Learning and Deep Q-Networks
    1. Why reinforcement learning?
    2. Model based learning and model free learning
      1. Monte Carlo learning
      2. Temporal difference learning
      3. On-policy and off-policy learning
    3. Q-learning
      1. The exploration exploitation dilemma
      2. Q-learning for the mountain car problem in OpenAI gym
    4. Deep Q-networks
      1. Using a convolution neural network instead of a single layer neural network
      2. Use of experience replay
      3. Separate target network to compute the target Q-values
      4. Advancements in deep Q-networks and beyond
        1. Double DQN
        2. Dueling DQN
      5. Deep Q-network for mountain car problem in OpenAI gym
      6. Deep Q-network for Cartpole problem in OpenAI gym
      7. Deep Q-network for Atari Breakout in OpenAI gym
    5. The Monte Carlo tree search algorithm
      1. Minimax and game trees
      2. The Monte Carlo Tree Search
    6. The SARSA algorithm
      1. SARSA algorithm for mountain car problem in OpenAI gym
    7. Summary
  11. Asynchronous Methods
    1. Why asynchronous methods?
    2. Asynchronous one-step Q-learning
    3. Asynchronous one-step SARSA
    4. Asynchronous n-step Q-learning
    5. Asynchronous advantage actor critic
    6. A3C for Pong-v0 in OpenAI gym
    7. Summary
  12. Robo Everything – Real Strategy Gaming
    1. Real-time strategy games
    2. Reinforcement learning and other approaches
      1. Online case-based planning
        1. Drawbacks to real-time strategy games
      2. Why reinforcement learning?
    3. Reinforcement learning in RTS gaming
      1. Deep autoencoder
      2. How is reinforcement learning better?
    4. Summary
  13. AlphaGo – Reinforcement Learning at Its Best
    1. What is Go?
      1. Go versus chess
        1. How did DeepBlue defeat Gary Kasparov?
          1. Why is the game tree approach no good for Go?
    2. AlphaGo – mastering Go
      1. Monte Carlo Tree Search
      2. Architecture and properties of AlphaGo 
      3. Energy consumption analysis – Lee Sedol versus AlphaGo
    3. AlphaGo Zero
      1. Architecture and properties of AlphaGo Zero
        1. Training process in AlphaGo Zero 
    4. Summary
  14. Reinforcement Learning in Autonomous Driving
    1. Machine learning for autonomous driving
    2. Reinforcement learning for autonomous driving
      1. Creating autonomous driving agents
      2. Why reinforcement learning ?
    3. Proposed frameworks for autonomous driving
      1. Spatial aggregation
        1. Sensor fusion
        2. Spatial features
      2. Recurrent temporal aggregation
      3. Planning
    4. DeepTraffic – MIT simulator for autonomous driving 
    5. Summary
  15. Financial Portfolio Management
    1. Introduction
    2. Problem definition
    3. Data preparation
    4. Reinforcement learning
    5. Further improvements
    6. Summary
  16. Reinforcement Learning in Robotics
    1. Reinforcement learning in robotics
      1. Evolution of reinforcement learning
    2. Challenges in robot reinforcement learning
      1. High dimensionality problem
      2. Real-world challenges
      3. Issues due to model uncertainty
      4. What's the final objective a robot wants to achieve?
    3. Open questions and practical challenges
      1. Open questions
      2. Practical challenges for robotic reinforcement learning
    4. Key takeaways
    5. Summary
  17. Deep Reinforcement Learning in Ad Tech
    1. Computational advertising challenges and bidding strategies
      1. Business models used in advertising
      2. Sponsored-search advertisements
        1. Search-advertisement management
        2. Adwords
      3. Bidding strategies of advertisers
    2. Real-time bidding by reinforcement learning in display advertising
    3. Summary
  18. Reinforcement Learning in Image Processing
    1. Hierarchical object detection with deep reinforcement learning
      1. Related works
        1. Region-based convolution neural networks
        2. Spatial pyramid pooling networks
        3. Fast R-CNN
        4. Faster R-CNN
        5. You Look Only Once
        6. Single Shot Detector
      2. Hierarchical object detection model
        1. State
        2. Actions
        3. Reward
        4. Model and training
          1. Training specifics
    2. Summary
  19. Deep Reinforcement Learning in NLP
    1. Text summarization
      1. Deep reinforced model for Abstractive Summarization
        1. Neural intra-attention model
          1. Intra-temporal attention on input sequence while decoding
          2. Intra-decoder attention
          3. Token generation and pointer
        2. Hybrid learning objective
          1. Supervised learning with teacher forcing
          2. Policy learning
          3. Mixed training objective function
    2. Text question answering
      1. Mixed objective and deep residual coattention for Question Answering
        1. Deep residual coattention encoder
        2. Mixed objective using self-critical policy learning
    3. Summary
  20. Further topics in Reinforcement Learning
    1. Continuous action space algorithms
      1. Trust region policy optimization
      2. Deterministic policy gradients
    2. Scoring mechanism in sequential models in NLP
      1. BLEU
        1. What is BLEU score and what does it do?
      2. ROUGE
    3. Summary
  21. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think