January 2018
Intermediate to advanced
470 pages
11h 9m
English
In RL lingo, we call a strategy policy. The goal of RL is to discover a good strategy. One of the most common ways to solve it is by observing the long-term consequences of actions in each state. The short-term consequence is easy to calculate: it's just the reward. Although performing an action yields an immediate reward, it is not always a good idea to greedily choose the action with the best reward. That is a lesson in life too, because the most immediate best thing to do may not always be the most satisfying in the long run. The best possible policy is called the optimal policy, and it is often the holy grail of RL, as shown in Figure 3, which shows the optimal action, given any state:
Read now
Unlock full access