So far, we have randomly picked an action and applied it to the game. Now, let's apply DQN for selecting actions for playing the PacMan game.
- We define the q_nn policy function as follows:
def policy_q_nn(obs, env): # Exploration strategy - Select a random action if np.random.random() < explore_rate: action = env.action_space.sample() # Exploitation strategy - Select the action with the highest q else: action = np.argmax(q_nn.predict(np.array([obs]))) return action
- Next, we modify the episode function to incorporate calculation of q_values and train the neural network on the sampled experience buffer. This is shown in the following code:
def episode(env, policy, r_max=0, t_max=0): # create the empty list to contain ...