N-step DQN

The first improvement that we'll implement and evaluate is quite an old one. It was first introduced in the paper by Richard Sutton ([2] Sutton, 1988). To get the idea, let's look at the Bellman update used in Q-learning once again.

N-step DQN

This equation is recursive, which means that we can express N-step DQN in terms of itself, which gives us this result:

N-step DQN

Value ra,t+1 means local reward at time t+1, after issuing action a. However, if we assume that our action ...

Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.