In Chapter 8, Implementing an Intelligent Autonomous Car Driving Agent Using the Deep Actor-Critic algorithm, we walked you through the derivation of the policy gradient theorem and reproduced the following for bringing in context:
You may recall that the policy we considered was a stochastic function that assigned a probability to each action given the state (s) and the parameters (). In deterministic policy gradients, the stochastic policy is replaced by a deterministic policy that prescribes a fixed policy for a given state ...