January 2020
Intermediate to advanced
432 pages
10h 18m
English
Iterating over values may seem a step back to what we referred to as policy iteration in the last section, but it is actually more of a side step or companion method. In value iteration, we loop through all states in the entire MDP looking for the best value for each state, and when we find that, we stop or break. However, we don't stop there and we continue by looking ahead of all states and then assuming a deterministic probability of 100% for the best action. This yields a new policy that may perform better than the previous policy iteration demonstration. The differences between both methods are subtle and best understood with a code example. Open up Chapter_2_7.py and follow the next exercise:
Read now
Unlock full access