Skip to Main Content
Hands-On Reinforcement Learning with Python
book

Hands-On Reinforcement Learning with Python

by Sudharsan Ravichandiran
June 2018
Intermediate to advanced content levelIntermediate to advanced
318 pages
9h 24m
English
Packt Publishing
Content preview from Hands-On Reinforcement Learning with Python

TD prediction

Like we did in Monte Carlo prediction, in TD prediction we try to predict the state values. In Monte Carlo prediction, we estimate the value function by simply taking the mean return. But in TD learning, we update the value of a previous state by current state. How can we do this? TD learning using something called a TD update rule for updating the value of a state, as follows:

The value of a previous state = value of previous state + learning_rate (reward + discount_factor(value of current state) - value of previous state)

What does this equation actually mean?

If you think of this equation intuitively, it is actually the difference ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Deep Learning with Python

Advanced Deep Learning with Python

Ivan Vasilev

Publisher Resources

ISBN: 9781788836524Supplemental Content