Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Miao Liu; Christopher Amato; Emily Anesta; John Griffith; Jonathan How

Back to AAAI

AAAI 2016

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Conference Paper Papers Artificial Intelligence

PDF Details

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for multiagent sequential decision-making under uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions (i. e. , temporally-extended actions) has signiﬁcantly increased the size of problems that can be solved. However, current methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. To accommodate more realistic scenarios, when such information is not available, this paper presents a policy-based reinforcement learning approach, which learns the agent policies based solely on trajectories generated by previous interaction with the environment (e. g. , demonstrations). We show that our approach is able to generate valid macro-action controllers and develop an expectationmaximization (EM) algorithm (called Policy-based EM or PoEM), which has convergence guarantees for batch learning. Our experiments show PoEM is a scalable learning method that can learn optimal policies and improve upon hand-coded “expert” solutions.

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Abstract

Authors

Keywords

Context