October 2019
Intermediate to advanced
366 pages
12h 4m
English
Specifically, DAgger proceeds by iterating the following procedure. At the first iteration, a dataset D of trajectories is created from the expert policy and used to train a first policy
that best fits those trajectories without overfitting them. Then, during iteration i, new trajectories are collected with the learned policy
and added to the dataset D. After that, the aggregated dataset D with the new and old trajectories is used to train a new policy, .
As per the report in the Dagger paper (https://arxiv.org/pdf/1011.0686.pdf ...
Read now
Unlock full access