October 2019
Intermediate to advanced
366 pages
12h 4m
English
Overall, as we have seen so far, the AC algorithm is very similar to the REINFORCE algorithm, with the state function as a baseline. But, to provide a recap, the algorithm is summarized in the following code:
Initializewith random weightInitialize environment
for episode 1..M do Initialize empty buffer > Generate a few episodes for step 1..MaxSteps do > Collect experience by acting on the environment
if : > Compute the ...
Read now
Unlock full access