June 2018
Intermediate to advanced
436 pages
10h 33m
English
The long-term reward is the utility. To decide which action to take, an agent can the action that produces the highest utility in a greedy way. The utility of performing an action a at a state s is written as a function Q(s, a), called the utility function. The utility function predicts the immediate and final rewards based on an optimal policy generated by the input consisting of state and action, as shown in the following diagram:
