In this section, we are going to look at how policy or off-model based methods such as PPO can be improved on by introducing multiple agents to train the same policy. The example exercise you will use in this section will be completely up to you, and should be one that you are familiar with and/or interested in. For our purposes, we will explore a sample that we have looked at extensively—the Hallway/VisualHallway. If you have been following most of the exercises in this book, you should be more than capable of adapting this example. However, note that, for this exercise, we want to use a sample that is set up to use multiple agents for training.
Previously, we avoided discussing the multiple agents; we avoided this ...