The basic intuition behind PG methods is we move from finding a value function that describes a deterministic policy to a stochastic policy with parameters used to define a policy distribution. Thinking this way, we can now assume that our policy function needs to be defined so that our policy, π, can be set by adjusting parameters θ so that we understand the probability of taking a given action in a state. Mathematically, we can simply define this like so:
