Alex used the notion of hierarchical reinforcement learning in order to tackle the problem of multi-task agent learning that OTC requires you to solve. HRL is another method outside Meta-RL that has been used to successfully solve multi-task problems. Prierarchy-RL refines this by building a prior hierarchy that allows an action or action-state to be defined by entropy or uncertainty. High entropy or highly uncertain actions become high level or top-based actions. This is someone abstract in concept, so let's look at a code example to see how this comes together:
- The base agent used to win the challenge was PPO; following is a full source listing of that agent and a refresher to PPO:
import itertoolsimport ...