January 2019
Intermediate to advanced
386 pages
11h 13m
English
One of the most important distinctions between RL and other machine learning (ML) approaches is that in RL we have delayed rewards. That is, the agent might have to take a number of actions before the environment provides any reward signal. For example, in the maze game, the reward might come only at the end, when the maze exit square is reached. Therefore, when evaluating an action, the agent has to consider the problem in its entirety and not just the immediate consequences. This is unlike supervised learning, where the algorithm receives some kind of feedback (such as a label) for each training sample and has no knowledge of (or interest in) the end goal. The various RL system elements we ...