April 2018
Intermediate to advanced
334 pages
10h 18m
English
Unlike in Monte Carlo learning where we do a full look ahead, here, in temporal difference learning, there is only one look ahead, that is, we observe only the next step in the episode:

Temporal difference learning is the one used for learning the value function in value and policy iteration methods and the Q-function in Q-learning.
If we want our AI agent to always choose an action that maximizes the discounted future rewards, then we need some sort of temporal difference learning. For that, we need to define a function Q that represents the maximum discounted future rewards when we take an action a at state ...