October 2019
Intermediate to advanced
366 pages
12h 4m
English
To put this strategy into code, we have to create two critics with different initializations, compute the target action value as in (8.7), and optimize both critics.
With regard to the double critic, you have just to create them by calling deterministic_actor_double_critic twice, once for the target and once for the online networks, as done in DDPG. The code will be similar ...
Read now
Unlock full access