October 2019
Intermediate to advanced
366 pages
12h 4m
English
Before showing the results of the imitation learning approach, we want to provide some numbers so that you can compare these with those of a reinforcement learning algorithm. We know that this is not a fair comparison (the two algorithms work on very different conditions), but nevertheless, they underline why imitation learning can be rewarding when an expert is available.
The expert has been trained with proximal policy optimization for about 2 million steps and, after about 400,000 steps, reached a plateau score of about 138.
We tested DAgger on Flappy Bird with the following hyperparameters:
| Hyperparameter | Variable name | Value |
| Learner hidden layers | hidden_sizes | 16,16 |
| DAgger iterations | dagger_iterations ... |
Read now
Unlock full access