Value function

The second component an agent can have is called the value function. As mentioned previously, it is useful to assess your position, good or bad, in a given state. In a game of chess, a player would like to know the likelihood that they are going to win in a board state. An agent navigating a maze would like to know how close it is to the destination. The value function serves this purpose; it predicts the expected future reward an agent would receive in a given state. In other words, it measures whether a given state is desirable for the agent. More formally, the value function takes a state and a policy as input and returns a scalar value representing the expected cumulative reward:

Take our maze example, and suppose the ...

Get Python Reinforcement Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.