January 2020
Intermediate to advanced
432 pages
10h 18m
English
Throughout this book, we will explore methods for allowing an algorithm to predict and control an agent to complete a task. Prediction and control are at the heart of RL, and previously we had both methods separate. That is, they either ran before (DP) or after (MC). Now, for an agent to learn in real time, we need an online update rule that will update the value function after a designated time step. In TDL, this is called the TD update rule.
The rule is shown here in equation form:

In the previous equation, we have the following:
Read now
Unlock full access