Temporal Difference Learning
In our previous discussion on the history of reinforcement learning, we covered the two main threads, trial and error and Dynamic Programming (DP), which came together to derive current modern Reinforcement Learning (RL). As we mentioned in earlier chapters, there is also a third thread that arrived late called Temporal Difference Learning (TDL). In this chapter, we will explore TDL and how it solves the Temporal Credit Assignment (TCA) problem. From there, we will explore how TD differs from Monte Carlo (MC) and how it evolves to full Q-learning. After that, we will explore the differences between on-policy and off-policy learning and then, finally, work on a new example RL environment.
For this chapter, we will ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access