October 2019
Intermediate to advanced
340 pages
8h 39m
English
Let's develop Q-learning with FA using the linear estimator, Estimator, from linear_estimator.py, which we developed in the previous recipe, Estimating Q-functions with gradient descent approximation:
>>> import gym>>> import torch>>> from linear_estimator import Estimator>>> env = gym.envs.make("MountainCar-v0")
>>> def gen_epsilon_greedy_policy(estimator, epsilon, n_action): ... def policy_function(state): ... probs = torch.ones(n_action) * epsilon / n_action ... q_values = estimator.predict(state) ... best_action = torch.argmax(q_values).item() ... probs[best_action] += 1.0 - epsilon ... action = torch.multinomial(probs, ...
Read now
Unlock full access