Taxi drop-off using Q-Network

If we consider the preceding Taxi drop-off example, our neural network will consist of 500 input neurons (the state represented by 1×500 one-hot vector) and 6 output neurons, each neuron representing the Q-value for the particular action for the given state.  The neural network will here approximate the Q-value for each action. Hence, the network should be trained so that its approximated Q-value and the target Q-value are same. The target Q-value as obtained from the Bellman Equation is as follows:

We train the neural network so that the square error of the difference between the target Q and predicted Q is minimized—that ...

Get Hands-On Artificial Intelligence for IoT now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.