January 2020
Intermediate to advanced
432 pages
10h 18m
English
One thing we should identify before getting too far ahead is that the method we look at here is for one-step TD or what we refer to as TD(0). Remember, as programmers, we start counting at 0, so TD(0) essentially means TD one-step. We will look at multiple-step TD in Chapter 5, Exploring SARSA.
For now, though, we will look at an example of using one-step TD in the next exercise:
import numpy as npfrom tqdm import tqdmimport randomgamma = 0.5 rewardSize = -1gridSize = 4alpha = 0.5 terminations = [[0,0], [gridSize-1, gridSize-1]]actions = [[-1, 0], [1, 0], [0, 1], [0, -1]]episodes = 10000V = np.zeros((gridSize, gridSize))returns = {(i, j):list() ...Read now
Unlock full access