December 2018
Beginner to intermediate
684 pages
21h 9m
English
At any point in time, the policy defines the agent's behavior. It maps any state the agent may encounter to one or several actions. In a simple environment with a limited number of states and actions, the policy can be a simple lookup table that's filled in during training.
With continuous states and actions, the policy takes the form of a function that ML can help to approximate. The policy may also involve significant computation, as in the case of AlphaZero, which uses tree search to decide on the best action for a given game state. The policy may also be stochastic and assign probabilities to actions given a state.