October 2019
Intermediate to advanced
340 pages
8h 39m
English
We obtained a success rate of 75% with a discount factor of 0.99. How does the discount factor affect the performance? Let's do some experiments with different factors, including 0, 0.2, 0.4, 0.6, 0.8, 0.99, and 1.:
>>> gammas = [0, 0.2, 0.4, 0.6, 0.8, .99, 1.]
For each discount factor, we compute the average success rate over 10,000 episodes:
>>> avg_reward_gamma = []>>> for gamma in gammas:... V_optimal = value_iteration(env, gamma, threshold)... optimal_policy = extract_optimal_policy(env, V_optimal, gamma)... total_rewards = []... for episode in range(n_episode):... total_reward = run_episode(env, optimal_policy)... total_rewards.append(total_reward)... avg_reward_gamma.append(sum(total_rewards) / n_episode)
We draw a ...
Read now
Unlock full access