How it works...
In Step 2, the DQN class takes in four parameters: the number of input states and output actions, the number of hidden nodes (we herein just use one hidden layer as an example), and the learning rate. It initializes a neural network with one hidden layer, followed by a ReLU activation function. It takes in n_state units and generates one n_action output, which are the predicted state values for individual actions. An optimizer, Adam, is initialized along with each linear model. The loss function is the mean squared error.
Step 3 is for updating the network: given a training data point, the predictive result, along with the target value, is used to compute the loss and gradients. The neural network model is then updated via ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access