January 2020
Intermediate to advanced
432 pages
10h 18m
English
For us to determine the best policy, we first need a method to evaluate the given policy for a state. We can use a method of evaluating the policy by searching through all of the states of an MDP and further evaluating all actions. This will provide us with a value function for the given state that we can then use to perform successive updates of a new value function iteratively. Mathematically, we can then use the previous Bellman optimality equation and derive a new update to a state value function, as shown here:

In the preceding equation, the symbol represents an expectation and denotes the expected state value ...
Read now
Unlock full access