October 2019
Intermediate to advanced
366 pages
12h 4m
English
We can now apply value iteration to the FrozenLake game in order to compare the two DP algorithms and to see whether they converge to the same policy and value function.
Let's define eval_state_action as before to estimate the action state value for a state-action pair:
def eval_state_action(V, s, a, gamma=0.99): return np.sum([p * (rew + gamma*V[next_s]) for p, next_s, rew, _ in env.P[s][a]])
Then, we create the main body of the value iteration algorithm:
def value_iteration(eps=0.0001): V = np.zeros(nS) it = 0 while True: delta = 0 # update the value for each state for s in range(nS): old_v = V[s] V[s] = np.max([eval_state_action(V, s, a) for a in range(nA)]) # equation 3.10 delta = max(delta, np.abs(old_v ...
Read now
Unlock full access