January 2020
Intermediate to advanced
432 pages
10h 18m
English
While that Q-learning equation may seem a lot more complex, actually implementing the equation is not unlike building our agent that just learned values earlier. To keep things simpler, we will use the same base of code but turn it into a Q-learning example. Open up the code example, Chapter_1_4.py, and follow the exercise here:
import randomarms = 7bandits = 7learning_rate = .1gamma = .9episodes = 10000reward = []for i in range(bandits): reward.append([]) for j in range(arms): reward[i].append(random.uniform(-1,1))print(reward)Q = []for i in range(bandits): Q.append([]) for j in range(arms): Q[i].append(10.0)print(Q)def greedy(values): return values.index(max(values)) ...
Read now
Unlock full access