January 2019
Intermediate to advanced
386 pages
11h 13m
English
Temporal difference (TD) is a class of model-free RL methods. On the one hand, they can learn from the agent's experience, such as MC. On the other hand, they can estimate state values based on the values of other states, such as DP. As usual, we'll explore the policy evaluation and improvement tasks.