January 2018
Beginner to intermediate
284 pages
8h 35m
English
One of the other reasons for instability in Q-learning is the frequent changes to the target function. This is shown in the following equations:
The target network trick fixes the parameter of the target function
for every specified number of steps (for example, 1000 steps). At the end of each episode, the parameter is updated with the latest value from the network:
Read now
Unlock full access