October 2019
Intermediate to advanced
366 pages
12h 4m
English
Value iteration is the other dynamic programming algorithm to find optimal values in an MDP, but unlike policy iterations that execute policy evaluations and policy iterations in a loop, value iteration combines the two methods in a single update. In particular, it updates the value of a state by selecting the best action immediately:

The code for value iteration is even simpler than the policy iteration code, summarized in the following pseudocode:
Initializefor every state while is not stable: > value iteration for each ...
Read now
Unlock full access