The agent code now plays or explores the environment and it is helpful if we understand how this code runs. Open up Chapter_3_3.py again and follow the exercise:
- All we need to focus on for this section is how the agent plays the game. Scroll down to the play_game function, as shown in the following:
def play_game(env, policy, display=True): env.reset() episode = [] finished = False while not finished: s = env.env.s if display: clear_output(True) env.render() sleep(1) timestep = [] timestep.append(s) n = random.uniform(0, sum(policy[s].values())) top_range = 0 action = 0 for prob in policy[s].items(): top_range += prob[1] if n < top_range: action = prob[0] break state, reward, finished, info = env.step(action) ...