Table of Contents
Multiagent Markov Decision Process (MMDP)
Formalized in 1996 by Craig Boutilier [1], the Multiagent MDP is one of the earliest formalizations of an MDP frawework for multiple decision agents, and likewise one of the simplest. The MMDP specifies the transition of the world state as a function of not a single action variable (as in the MDP) but a joint action comprising agents individual actions.
Motivating Problems
Although no particular problem was described in Boutilier's treatment, it is clear that the MMDP was created for fully cooperative agents that act collectively and hence should coordinate their actions.
Key Assumptions
- Full observability: all agents observe the world state directly at each time step.
Theoretical Properties
Due to the fact that the MMDP is simply an MDP with a joint action, it resides in the same complexity class: P-SPACE complete.
Solution Methods
Incidentally, all methods that apply to MDPs can also be used to solve MMDPs.
References
[1] Planning, Learning and Coordination in Multiagent Decision Processes. Craig Boutilier. TARK, page 195-210. Morgan Kaufmann, 1996.