October 2019
Intermediate to advanced
340 pages
8h 39m
English
We perform SARSA to solve the Windy Gridworld environment as follows:
>>> import torch>>> from windy_gridworld import WindyGridworldEnv>>> env = WindyGridworldEnv()
>>> def gen_epsilon_greedy_policy(n_action, epsilon): ... def policy_function(state, Q): ... probs = torch.ones(n_action) * epsilon / n_action ... best_action = torch.argmax(Q[state]).item() ... probs[best_action] += 1.0 - epsilon ... action = torch.multinomial(probs, 1).item() ... return action ... return policy_function
Read now
Unlock full access