We could, of course, implement TD (λ) using the tabular online method, which we haven't covered yet, or with Q-learning. However, since this is a chapter on SARSA, it only makes sense that we continue with that theme throughout. Open Chapter_5_4.py and follow the exercise:
- The code is quite similar to our previous examples, but let's review the full source code, as follows:
import gymimport mathfrom copy import deepcopyimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsenv = gym.make('MountainCar-v0')Q_table = np.zeros((65,65,3))alpha=0.3buckets=[65, 65]gamma=0.99rewards=[]episodes=2000lambdaa=0.8def to_discrete_states(observation): interval=[0 for i in range(len(observation))] max_range=[1.2,0.07] ...