Taxi drop-off using Q-Network

If we consider the preceding Taxi drop-off example, our neural network will consist of 500 input neurons (the state represented by 1×500 one-hot vector) and 6 output neurons, each neuron representing the Q-value for the particular action for the given state. The neural network will here approximate the Q-value for each action. Hence, the network should be trained so that its approximated Q-value and the target Q-value are same. The target Q-value as obtained from the Bellman Equation is as follows:

We train the neural network so that the square error of the difference between the target Q and predicted Q is minimized—that ...

Get Hands-On Artificial Intelligence for IoT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.