Multiagent Markov Decision Process (MMDP)

Formalized in 1996 by Craig Boutilier [1], the Multiagent MDP is one of the earliest formalizations of an MDP frawework for multiple decision agents, and likewise one of the simplest. The MMDP specifies the transition of the world state as a function of not a single action variable (as in the MDP) but a joint action comprising $ n $ agents individual actions.

Motivating Problems

Although no particular problem was described in Boutilier's treatment, it is clear that the MMDP was created for fully cooperative agents that act collectively and hence should coordinate their actions.

Key Assumptions

  • Full observability: all agents observe the world state directly at each time step.

Theoretical Properties

Due to the fact that the MMDP is simply an MDP with a joint action, it resides in the same complexity class: P-SPACE complete.

Solution Methods

Incidentally, all methods that apply to MDPs can also be used to solve MMDPs.


[1] Planning, Learning and Coordination in Multiagent Decision Processes. Craig Boutilier. TARK, page 195-210. Morgan Kaufmann, 1996. ‎

models-and-methods/mmdp.txt · Last modified: 2014/05/06 05:05 by stefan
Recent changes RSS feed Creative Commons License Donate Minima Template by Wikidesign Driven by DokuWiki