April 2018
Intermediate to advanced
334 pages
10h 18m
English
Until now, we have covered the blocks that create an MDP problem, that is, states, actions, transition models, and rewards, now comes the solution. The policy is the solution to an MDP problem.

The policy is a function that takes the state as an input and outputs the action to be taken. Therefore, the policy is a command that the agent has to obey.
is called the optimal policy, which maximizes the expected reward. Among all the policies taken, the optimal policy is the one that optimizes to maximize the amount of reward received or expected ...