In Q-learning, a deep neural network learns a set of weights to approximate the Q-value function. Thereby, the Q-value function is parametrized by (the weights of the network) and written as follows:
To adapt Q-learning with deep neural networks (this combination takes the name of deep Q-learning), we have to come up with a loss function (or objective) to minimize.
As you may recall, the tabular Q-learning update is as follows:
Here, is the state at the next step. This update is done online on each sample ...