Sooraj Bhat Papers

AAMAS Conference 2007 Conference Paper

A Globally Optimal Algorithm for TTD-MDPs

Sooraj Bhat
David L. Roberts
Mark J. Nelson
Charles L. Isbell
Michael Mateas

In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)–a variant of MDPs in which the goal is to realize a specified distribution of trajectories through a state space–as a general agent-coordination framework.

PDF

AAMAS Conference 2006 Conference Paper

A Decision-Theoretic Approach to File Consistency in Constrained Peer-to-Peer Device Networks

David Roberts
Sooraj Bhat
Charles Isbell
Brian Cooper
Jeffrey S. Pierce

AAAI Conference 2006 Conference Paper

On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming

Sooraj Bhat

In recent years there has been a great deal of interest in “modular reinforcement learning” (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents’ preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A2 BL, a language that encapsulates many of these ideas.

PDF Details

Possible papers

A Globally Optimal Algorithm for TTD-MDPs

A Decision-Theoretic Approach to File Consistency in Constrained Peer-to-Peer Device Networks

On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming