October 2019
Intermediate to advanced
366 pages
12h 4m
English
To find the optimal policy, you first need to find the optimal value function. An iterative procedure that does this is called policy evaluation—it creates a
sequence that iteratively improves the value function for a policy,
, using the state value transition of the model, the expectation of the next state, and the immediate reward. Therefore, it creates a sequence of improving value functions using the Bellman equation:
This sequence will converge to the optimal value as . Figure 3.3 shows the update ...
Read now
Unlock full access