O'Reilly logo

Java Deep Learning Projects by Md. Rezaul Karim

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Utility

The long-term reward is the utility. To decide which action to take, an agent can the action that produces the highest utility in a greedy way. The utility of performing an action a at a state s is written as a function Q(s, a), called the utility function. The utility function predicts the immediate and final rewards based on an optimal policy generated by the input consisting of state and action, as shown in the following diagram:

Using a utility function

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required