Core concepts

In Chapter 8, Implementing an Intelligent Autonomous Car Driving Agent Using the Deep Actor-Critic algorithm, we walked you through the derivation of the policy gradient theorem and reproduced the following for bringing in context:

You may recall that the policy we considered was a stochastic function that assigned a probability to each action given the state (s) and the parameters (). In deterministic policy gradients, the stochastic policy is replaced by a deterministic policy that prescribes a fixed policy for a given state ...

Get Hands-On Intelligent Agents with OpenAI Gym now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.