Chapter 9. Cooperative Learning

In this chapter, we’re going to take another step forward with our simulations and reinforcement learning, and create a simulation environment in which multiple agents must work together toward a common goal. These sorts of simulations involve cooperative learning, and agents will usually receive their rewards as a group, instead of individually—including agents that might not have contributed to the actions that resulted in the rewards.

In Unity ML-Agents, the preferred training algorithm and approach for cooperative learning is known as Multi-Agent POsthumous Credit Assignment (or MA-POCA, for short). MA-POCA involves the training of a centralized critic or coach for a group of agents. The MA-POCA approach means agents can still learn what they need to do, even though the group is the entity being rewarded.

Tip

In cooperative learning environments, you can still give rewards to individual agents if you want. We’ll briefly touch on this later. You can also use other algorithms, or just PPO like usual, but MA-POCA has specialized features to make cooperative learning better. You could wire together a collection of PPO-trained agents to get a similar result. We don’t recommend it, though.

A Simulation for Cooperation

Let’s build a simulation environment with a collection of agents that need to work together. This environment has a lot of pieces, so take your time, step through slowly, and take notes if you need to.

Our environment will involve ...

Get Practical Simulations for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.