April 2018
Intermediate to advanced
334 pages
10h 18m
English
Since the optimal
policy is the policy that maximizes the expected rewards, therefore,
,where
means the expected value of the rewards obtained from the sequence of states agent observes if it follows the
policy. Thus,
outputs ...
Read now
Unlock full access