October 2019
Intermediate to advanced
366 pages
12h 4m
English
The return
provides a good insight into the trajectory's value, but still, it doesn't give any indication of the quality of the single states visited. This quality indicator is important because it can be used by the policy to choose the next best action. The policy has to just choose the action that will result in the next state with the highest quality. The value function does exactly this: it estimates the quality in terms of the expected return from a state following a policy. Formally, the value function is defined as follows:
The action-value function, similar to the value function, is the expected return from a state ...
Read now
Unlock full access