Policy
The policy chooses the actions to be taken in a given situation and can be categorized as deterministic or stochastic.
A deterministic policy is denoted as at = µ(st), while a stochastic policy can be denoted as at ~ π(.|st), where the tilde symbol (~) means has distribution. Stochastic policies are used when it is better to consider an action distribution; for example, when it is preferable to inject a noisy action into the system.
Generally, stochastic policies can be categorical or Gaussian. The former case is similar to a classification problem and is computed as a softmax function across the categories. In the latter case, the actions are sampled from a Gaussian distribution, described by a mean and a standard deviation (or variance). ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access