Q-learning with neural networks

In Q-learning, a deep neural network learns a set of weights to approximate the Q-value function. Thereby, the Q-value function is parametrized by (the weights of the network) and written as follows:

To adapt Q-learning with deep neural networks (this combination takes the name of deep Q-learning), we have to come up with a loss function (or objective) to minimize.

As you may recall, the tabular Q-learning update is as follows:

Here, is the state at the next step. This update is done online on each sample ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.