Modeling Difference Rewards for Multiagent Learning

Scott Proper; Kagan Tumer

Back to AAMAS

AAMAS 2012

Modeling Difference Rewards for Multiagent Learning

Conference Paper Extended Abstracts Autonomous Agents and Multiagent Systems

PDF

Abstract

Difference rewards (a particular instance of reward shaping) have been used to allow multiagent domains to scale to large numbers of agents, but they remain difficult to compute in many domains. We present an approach to modeling the global reward using function approximation that allows the quick computation of shaped difference rewards. We demonstrate how this model can result in significant improvements in behavior for two air traffic control problems. We show how the model of the global reward may be either learned on- or off-line using a linear combination of neural networks.

Authors

Keywords

Multiagent Coordination
Reward Shaping
Scaling
Air Traffic Control
Function Approximation
Neural Networks

Context

Venue: International Conference on Autonomous Agents and Multiagent Systems
Archive span: 2002-2025
Indexed papers: 7403
Paper id: 49390203717093801