December 2018
Beginner to intermediate
684 pages
21h 9m
English
In this section, we will demonstrate how to build a Q-learning agent using the 3 x 4 grid of states from the previous section. We will train the agent for 2,500 episodes, use a learning rate of α= 0.1, and an ε=0.05 for the ε-greedy policy (see the gridworld_q_learning notebook for details):
max_episodes = 2500alpha = .1epsilon = .05
Then, we will randomly initialize the state-action value function as a NumPy array with the dimensions number of states x number of actions:
Q = np.random.rand(num_states, num_actions)skip_states = list(absorbing_states.keys())+[blocked_state]Q[skip_states] = 0
The algorithm generates 2,500 episodes that start at a random location and proceed according to the ε-greedy ...