Chapter 4Learning in Multiplayer Stochastic Games

4.1 Introduction

The agents in a multiagent system can be to some degree preprogrammed with behaviors designed in advance. It is often necessary that the agents be able to learn online such that the performance of the multiagent system improves. However, typically a multiagent system is very complex and preprogramming the system is for practical reasons impossible. Furthermore, the dynamics of the agents and the environment can change over time and learning and adaptation is required.

In early work on multiagent reinforcement learning (MARL) for stochastic games [1], it was recognized that no agent works in a vacuum. In his seminal paper, Littman [1] focused on only two agents that had opposite and opposing goals. This means that they could use a single reward function which one tried to maximize and the other tried to minimize. The agent had to work with a competing agent and had to behave so as to maximize their reward in the worst possible case. They also recognized the need for mixed strategies because the agent or player could not be certain of the action taken by its opponent. Littman [1] introduced the minimax Q-learning algorithm. We have already shown the idea of the minimax Q-learning algorithm in Chapter 3, Section 3.2.

In a rational multiagent game, each agent must keep track in some way of what the other learning agents are doing. The types of games and situations that the learning agent may encounter include fully ...

Get Multi-Agent Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.