Policy
The policy defines how the agent selects an action given a state. The policy chooses the action that maximizes the cumulative reward from that state, not with the bigger immediate reward. It takes care of looking for the long-term goal of the agent. For example, if a car has another 30 km to go before reaching its destination, but only has another 10 km of autonomy left and the next gas stations are 1 km and 60 km away, then the policy will choose to get fuel at the first gas station (1 km away) in order to not run out of gas. This decision is not optimal in the immediate future as it will take some time to refuel, but it will be sure to ultimately accomplish the goal.
The following diagram shows a simple example where an actor moving ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access