book

Deep Reinforcement Learning Hands-On

Name: Deep Reinforcement Learning Hands-On
ISBN: 9781788834247

by Oleg Vasilev, Maxim Lapan, Martijn van Otterlo, Mikhail Yurushkin, Basem O. F. Alijla

June 2018

Intermediate to advanced

546 pages

13h 30m

English

Packt Publishing

Read now

Unlock full access

Deep Reinforcement Learning Hands-On
Table of Contents
Deep Reinforcement Learning Hands-On
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewers
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code filesDownload the color imagesConventions used

Get in touch
Reviews
1. What is Reinforcement Learning?
Learning – supervised, unsupervised, and reinforcement
RL formalisms and relations
RewardThe agentThe environmentActionsObservations
Markov decision processes
Markov processMarkov reward processMarkov decision process
Summary
2. OpenAI Gym
The anatomy of the agent
Hardware and software requirements
OpenAI Gym API
Action spaceObservation spaceThe environmentCreation of the environmentThe CartPole session
The random CartPole agent
The extra Gym functionality – wrappers and monitors
WrappersMonitor
Summary
3. Deep Learning with PyTorch
TensorsCreation of tensorsScalar tensorsTensor operationsGPU tensors
Gradients
Tensors and gradients
NN building blocks
Custom layers
Final glue – loss functions and optimizers
Loss functionsOptimizers
Monitoring with TensorBoard
TensorBoard 101Plotting stuff
Example – GAN on Atari images
Summary
4. The Cross-Entropy Method
Taxonomy of RL methods
Practical cross-entropy
Cross-entropy on CartPole
Cross-entropy on FrozenLake
Theoretical background of the cross-entropy method
Summary
5. Tabular Learning and the Bellman Equation
Value, state, and optimality
The Bellman equation of optimality
Value of action
The value iteration method
Value iteration in practice
Q-learning for FrozenLake
Summary
6. Deep Q-Networks
Real-life value iteration
Tabular Q-learning
Deep Q-learning
Interaction with the environmentSGD optimizationCorrelation between stepsThe Markov propertyThe final form of DQN training
DQN on Pong
WrappersDQN modelTrainingRunning and performanceYour model in action
Summary
7. DQN Extensions
The PyTorch Agent Net libraryAgentAgent's experienceExperience bufferGym env wrappers
Basic DQN
N-step DQN
Implementation
Double DQN
ImplementationResults
Noisy networks
ImplementationResults
Prioritized replay buffer
ImplementationResults
Dueling DQN
ImplementationResults
Categorical DQN
ImplementationResults
Combining everything
ImplementationResults
Summary
References
8. Stocks Trading Using RL
Trading
Data
Problem statements and key decisions
The trading environment
Models
Training code
Results
The feed-forward modelThe convolution model
Things to try
Summary
9. Policy Gradients – An Alternative
Values and policyWhy policy?Policy representationPolicy gradients
The REINFORCE method
The CartPole exampleResultsPolicy-based versus value-based methods
REINFORCE issues
Full episodes are requiredHigh gradients varianceExplorationCorrelation between samples
PG on CartPole
Results
PG on Pong
Results
Summary
10. The Actor-Critic Method
Variance reduction
CartPole variance
Actor-critic
A2C on Pong
A2C on Pong results
Tuning hyperparameters
Learning rateEntropy betaCount of environmentsBatch size
Summary
11. Asynchronous Advantage Actor-Critic
Correlation and sample efficiency
Adding an extra A to A2C
Multiprocessing in Python
A3C – data parallelism
Results
A3C – gradients parallelism
Results
Summary
12. Chatbots Training with RL
Chatbots overview
Deep NLP basics
Recurrent Neural NetworksEmbeddingsEncoder-Decoder
Training of seq2seq
Log-likelihood trainingBilingual evaluation understudy (BLEU) scoreRL in seq2seqSelf-critical sequence training
The chatbot example
The example structureModules: cornell.py and data.pyBLEU score and utils.pyModelTraining: cross-entropyRunning the trainingChecking the dataTesting the trained modelTraining: SCSTRunning the SCST trainingResultsTelegram bot
Summary
13. Web Navigation
Web navigationBrowser automation and RLMini World of Bits benchmark
OpenAI Universe
InstallationActions and observationsEnvironment creationMiniWoB stability
Simple clicking approach
Grid actionsExample overviewModelTraining codeStarting containersTraining processChecking the learned policyIssues with simple clicking
Human demonstrations
Recording the demonstrationsRecording formatTraining using demonstrationsResultsTicTacToe problem
Adding text description
Results
Things to try
Summary
14. Continuous Action Space
Why a continuous space?
Action space
Environments
The Actor-Critic (A2C) method
ImplementationResultsUsing models and recording videos
Deterministic policy gradients
ExplorationImplementationResultsRecording videos
Distributional policy gradients
ArchitectureImplementationResults
Things to try
Summary
15. Trust Regions – TRPO, PPO, and ACKTR
Introduction
Roboschool
A2C baseline
ResultsVideos recording
Proximal Policy Optimization
ImplementationResults
Trust Region Policy Optimization
ImplementationResults
A2C using ACKTR
ImplementationResults
Summary
16. Black-Box Optimization in RL
Black-box methods
Evolution strategies
ES on CartPole
Results
ES on HalfCheetah
Results
Genetic algorithms
GA on CartPole
Results
GA tweaks
Deep GANovelty search
GA on Cheetah
Results
Summary
References
17. Beyond Model-Free – Imagination
Model-based versus model-free
Model imperfections
Imagination-augmented agent
The environment modelThe rollout policyThe rollout encoderPaper results
I2A on Atari Breakout
The baseline A2C agentEM trainingThe imagination agentThe I2A modelThe Rollout encoderTraining of I2A
Experiment results
The baseline agentTraining EM weightsTraining with the I2A model
Summary
References
18. AlphaGo Zero
Board games
The AlphaGo Zero method
OverviewMonte-Carlo Tree SearchSelf-playTraining and evaluation
Connect4 bot
Game modelImplementing MCTSModelTrainingTesting and comparison
Connect4 results
Summary
References
Book summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index

Overview

If you want to learn how to create intelligent agents capable of mastering complex environments, Deep Reinforcement Learning Hands-On is an essential resource. This book provides a developer-oriented introduction to reinforcement learning (RL), covering fundamental concepts and advanced algorithms. You'll explore practical implementation techniques and gain the skills to solve real-world problems using RL.

What this Book will help me do

Understand and implement foundational RL concepts, such as Markov decision processes and Q-learning.
Apply deep RL algorithms like DQN, TRPO, PPO, and AlphaGo Zero in practical contexts.
Develop agents that solve Atari arcade and board games like Connect4 through self-learning.
Explore real-world applications, from designing stock trading agents to building conversational chatbots.
Utilize frameworks like PyTorch and OpenAI Gym for building and testing RL models.

Author(s)

Maxim Lapan is a seasoned author and practitioner in the fields of artificial intelligence and machine learning. He has extensive experience working with deep learning and reinforcement learning technologies. Maxim brings a perfect blend of academic understanding and practical insights, offering readers a hands-on approach to complex topics in RL.

Who is it for?

This book is ideal for developers and machine learning practitioners interested in mastering reinforcement learning. Readers should have some familiarity with Python and deep learning fundamentals. It is particularly suitable for those aiming to apply RL to practical projects and real-world problems. The content caters to both beginners in reinforcement learning and those looking to advance their skills in the domain.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Deep Reinforcement Learning Hands-On - Third Edition

Publisher Resources

ISBN: 9781788834247

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Deep Reinforcement Learning Hands-On

by Oleg Vasilev, Maxim Lapan, Martijn van Otterlo, Mikhail Yurushkin, Basem O. F. Alijla

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Deep Reinforcement Learning Hands-On - Third Edition

Grokking Deep Reinforcement Learning

Deep Reinforcement Learning Hands-On - Second Edition

Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence

Publisher Resources