October 2019
Intermediate to advanced
366 pages
12h 4m
English
In the first part of ME-TRPO, the dynamics of the environment (that is, the ensemble of models) are learned. The algorithm starts by interacting with the environment with a random policy,
, to collect a dataset of transitions,
. This dataset is then used to train all the dynamic models,
, in a supervised fashion. The models, , are initialized with different random weights and are trained with different mini-batches. To avoid ...
Read now
Unlock full access