Overall, as we have seen so far, the AC algorithm is very similar to the REINFORCE algorithm, with the state function as a baseline. But, to provide a recap, the algorithm is summarized in the following code:
Initialize with random weightInitialize environment for episode 1..M do Initialize empty buffer > Generate a few episodes for step 1..MaxSteps do > Collect experience by acting on the environment if : > Compute the ...