April 2019
Intermediate to advanced
212 pages
5h 34m
English
We store the results of each action step in a list called self.memory. When we decide to run the update function, we choose a random sample of action steps stored as memories and update the Q-values of the state-action pairs represented in those memories. The remember function takes the step of storing each memory.
This works very much as if we were updating the Q-values after each action, but in this case we're delaying the updates and not performing them at every step.:
def update(self): if len(self.memory) < self.batch_size: return batch = random.sample(self.memory, self.batch_size) for state, action, reward, next_state, done in batch: q_update = reward #predict and update Q-values q_values = self.model.predict(state)
Read now
Unlock full access