October 2018
Intermediate to advanced
252 pages
6h 49m
English
The agent will randomly select an action first by a certain percentage. This is called exploration rate or epsilon. At first, the agent tries all kinds of things before it starts to learns the patterns. Subsequently, the agent will predict the reward value based on the current state and pick the action that will give the highest reward. np.argmax() is the function that picks the highest value between two elements in act_values[0]:
def act(self, state): if np.random.rand() <= self.epsilon: return random.randrange(self.action_size) act_values = self.model.predict(state) return np.argmax(act_values[0]) # returns action
act_values[0] looks like this: [14.145181, 11.2012205]. Each number represents the reward of picking action ...