OpenAI Gym has plenty of fun environments that allow us to switch out and test new environments very easily. This, as we have seen, allows us to compare results of algorithms far easier. However, as we have also seen, there are limitations to various algorithms, and the new environment we explore in this section introduces the limitation of time. That is, it places a time limit on the agent as part of the goal. In doing this, our previous algorithms, DP and MC, become unable to solve such a problem, which makes this a good example to also introduce time-based or time-critical rewards.
What better way to introduce time-dependent learning than to think of a time-dependent task? There are plenty of tasks, but ...