October 2019
Intermediate to advanced
340 pages
8h 39m
English
Finally, why don't we simulate some episodes and see what the chances of winning and losing will be under the resulting optimal policy?
We reuse the simulate_episode function we developed in the Performing on-policy Monte Carlo control recipe and simulate 100,000 episodes:
>>> n_episode = 100000>>> n_win_optimal = 0>>> n_lose_optimal = 0>>> for _ in range(n_episode):... reward = simulate_episode(env, optimal_policy)... if reward == 1:... n_win_optimal += 1... elif reward == -1:... n_lose_optimal += 1
Then, we print out the results we get:
>>> print('Winning probability under the optimal policy: {}'.format(n_win_optimal/n_episode))Winning probability under the optimal policy: 0.43072>>> print('Losing probability under the optimal ...Read now
Unlock full access