October 2018
Intermediate to advanced
472 pages
10h 57m
English
The following replay function is called inside the train function (defined in the next section) at the end of the game for training the agent. It is in this function that we define the targets for each state using the Q function Bellman equation:
def replay(epsilon, gamma, epsilon_min, epsilon_decay, model, training_data, batch_size=64): """Train the agent on a batch of data.""" idx = random.sample(range(len(training_data)), min(len(training_data), batch_size)) train_batch = [training_data[j] for j in idx] for state, new_state, reward, done, action in train_batch: target = reward if not done: target = reward + gamma * np.amax(model.predict(new_state)[0]) #print('target', target) target_f = model.predict(state) #print('target_f', ...Read now
Unlock full access