Understanding SARSA (λ)

We could, of course, implement TD (λ) using the tabular online method, which we haven't covered yet, or with Q-learning. However, since this is a chapter on SARSA, it only makes sense that we continue with that theme throughout. Open Chapter_5_4.py and follow the exercise:

  1. The code is quite similar to our previous examples, but let's review the full source code, as follows:
import gymimport mathfrom copy import deepcopyimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snsenv = gym.make('MountainCar-v0')Q_table = np.zeros((65,65,3))alpha=0.3buckets=[65, 65]gamma=0.99rewards=[]episodes=2000lambdaa=0.8def to_discrete_states(observation): interval=[0 for i in range(len(observation))] max_range=[1.2,0.07] ...

Get Hands-On Reinforcement Learning for Games now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.