N-step DQN
The first improvement that we'll implement and evaluate is quite an old one. It was first introduced in the paper by Richard Sutton ([2] Sutton, 1988). To get the idea, let's look at the Bellman update used in Q-learning once again.
This equation is recursive, which means that we can express in terms of itself, which gives us this result:
Value ra,t+1 means local reward at time t+1, after issuing action a. However, if we assume that our action ...
Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.