December 2018
Beginner to intermediate
684 pages
21h 9m
English
The reward provides immediate feedback on actions. However, solving an RL problem requires decisions that create value in the long run. This is where the value function comes in: it summarizes the utility of states or actions in a given state in terms of their long-term reward.
In other words, the value of a state is the total reward an agent can expect to obtain in the future when starting in that state. The immediate reward may be a good proxy of future rewards, but the agent also needs to account for cases where low rewards are followed by much better outcomes that are likely to follow (or the reverse).
Hence, value estimates aim to predict future rewards. Rewards are the key input, ...