Table of Contents
Multiagent Markov Decision Process (MMDP)
Formalized in 1996 by Craig Boutilier , the Multiagent MDP is one of the earliest formalizations of an MDP frawework for multiple decision agents, and likewise one of the simplest. The MMDP specifies the transition of the world state as a function of not a single action variable (as in the MDP) but a joint action comprising agents individual actions.
Although no particular problem was described in Boutilier's treatment, it is clear that the MMDP was created for fully cooperative agents that act collectively and hence should coordinate their actions.
- Full observability: all agents observe the world state directly at each time step.
Due to the fact that the MMDP is simply an MDP with a joint action, it resides in the same complexity class: P-SPACE complete.
Incidentally, all methods that apply to MDPs can also be used to solve MMDPs.
 Planning, Learning and Coordination in Multiagent Decision Processes. Craig Boutilier. TARK, page 195-210. Morgan Kaufmann, 1996.