5

Understanding Temporal Difference Learning

Temporal difference (TD) learning is one of the most popular and widely used model-free methods. The reason for this is that TD learning combines the advantages of both the dynamic programming (DP) method and the Monte Carlo (MC) method we covered in the previous chapters.

We will begin the chapter by understanding how exactly TD learning is beneficial compared to DP and MC methods. Later, we will learn how to perform the prediction task using TD learning. Going forward, we will learn how to perform TD control tasks with an on-policy TD control method called SARSA and an off-policy TD control method called Q learning.

We will also learn how to find the optimal policy in the Frozen Lake environment ...

Get Deep Reinforcement Learning with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.