September 2018
Intermediate to advanced
288 pages
7h 38m
English
In the previous section, we said that the goal of reinforcement learning is to learn a policy that, for each state s in which the system is located, indicates to the agent an action to maximize the total reward received during the entire action sequence. How can we maximize the total reinforcement received during the entire sequence of actions?
The total reinforcement derived from the policy is calculated as follows:

Here, rT represents the reward of the action that drives the environment in the terminal state sT.
A possible solution to the problem is to associate the action that provides the highest reward to ...
Read now
Unlock full access