January 2020
Intermediate to advanced
432 pages
10h 18m
English
Policy and value iteration methods are quite similar and looked at as companion methods. As such, to evaluate which method to use, we often need to apply both methods to the problem in question. In the next exercise, we will evaluate both policy and value iteration methods side by side in the FrozenLake environment:
def play(env, episodes, policy): wins = 0 total_reward = 0 for episode in range(episodes): term = False state = env.reset() while not term: action = np.argmax(policy[state]) next_state, reward, term, info = env.step(action) total_reward += reward state = ...
Read now
Unlock full access