Let’s apply experience replay to FA Q-learning using the linear estimator, Estimator, from linear_estimator.py, which we developed in the previous recipe, Estimating Q-functions with gradient descent approximation:
- Import the necessary modules and create a Mountain Car environment:
>>> import gym >>> import torch >>> from linear_estimator import Estimator >>> from collections import deque >>> import random >>> env = gym.envs.make("MountainCar-v0")
- We will reuse the epsilon-greedy policy function developed in the previous, Developing Q-learning with linear function approximation recipe.
- Then, specify the number of features as 200, the learning rate as 0.03, and create an estimator accordingly:
>>> n_state = env.observation_space.shape[0] ...