October 2019
Intermediate to advanced
366 pages
12h 4m
English
The expert should be a policy that takes a state as input and returns the best action. Despite this, it can be anything. In particular, for these experiments, we used an agent trained with Proximal Policy Optimization (PPO) as the expert. In principle, this doesn't make any sense, but we adopted this solution for academic purposes, to facilitate integration with the imitation learning algorithms.
The expert's model trained with PPO has been saved on file so that we can easily restore it with its trained weights. Three steps are required to restore the graph and make it usable:
Read now
Unlock full access