MDPs and POMDPs are both mathematical models that have been successfully used to formalize sequential decision-theoretic problems under uncertainty. They allow an agent to decide how to act in a stochastic environment in order to maximize a given performance measure. The agent’s decision is based on a complete (MDPs) or partial (POMDPs) observability of the environment. It has been proved that these models are efficient tools for solving problems of mono-agent control and they have been successfully applied to many domains such as mobile robots [BER 01], spoken dialog managers [ROY 00] or inventory management [PUT 94]. This encourages us to consider the extension of these models to cooperative multiagent systems.
EXAMPLE 9.1. Let us go back to the car maintenance example introduced before (see Chapter 1, section 1.1). We now consider several garage employees, who do not always coordinate, who are responsible for repairing and maintaining cars. We can imagine a futuristic scenario where the garage is equipped with a set of robots, each one dedicated to a specific task: one for brakes, one for tires, another one for oil, etc. Thus, several agents may decide to act simultaneously on the same car (or on the same part of the car) while possible having different and partial knowledge about the underlying state of the car. One key question then is: how can the agents cooperate in order to optimize the outcome of their joint action (which results ...