January 2020
Intermediate to advanced
432 pages
10h 18m
English
When we previously had a model, our algorithm could learn to plan and improve a policy offline. Now, with no model, our algorithm needs to become an agent and learn to explore and, while doing that, also learn and improve. This allows our agent to now learn effectively by trial and error. Let's jump back into the Chapter_3_3.py code example and follow the exercise:
episode = play_game(env=env, policy=policy, display=False)evaluate_policy_check(env, e, policy, test_policy_freq)
Read now
Unlock full access