Author name cluster

Alec Koppel

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

TMLR Journal 2025 Journal Article

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Peihong Yu
Manav Mishra
Alec Koppel
Carl Busart
Priya Narayan
Dinesh Manocha
Amrit Singh Bedi
Pratap Tokekar

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of individual agent behavior with demonstrations, and the second regulates incentives based on whether the behaviors lead to the desired outcome. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL’s capability of leveraging joint demonstrations in the StarCraft scenario and converging effectively even with demonstrations from non-co-trained policies.

PDF Details

RLC Conference 2025 Conference Paper

Building Sequential Resource Allocation Mechanisms without Payments

Sihan Zeng
Sujay Bhatt
Alec Koppel
Sumitra Ganesh

We study allocating divisible resources of limited quantities to agents who submit requests for the resources one or multiple times over a finite horizon. This is referred to as the sequential or online resource allocation problem, as irrevocable allocations need to be made as the requests arrive, without observations on the future requests. The existing work on sequential resource allocation (in the payment-free setting) mainly focuses on optimizing social welfare and designs mechanisms under the assumption that the agents make truthful requests. Such mechanisms can be easily exploitable -- strategic agents may misreport their requests to inflate their allocations. Our aim in this work is to design sequential resource allocation mechanisms that balance the competing objectives of social welfare maximization (promoting the overall agent satisfaction) and incentive compatibility (ensuring that the agents do not have incentives to misreport). We do not design these mechanisms from scratch. Instead, as the incentive compatible mechanism design problem has been well studied in the one-shot setting (horizon length equals one), we propose a general meta-algorithm of transforming a one-shot mechanism into its sequential counterpart. The meta-algorithm can plug in any one-shot mechanism and approximately carry over the properties that the one-shot mechanism already satisfies to the sequential setting. We establish theoretical results validating these claims and illustrate their superior performance relative to baselines in experiments.

PDF Details

RLJ Journal 2025 Journal Article

Building Sequential Resource Allocation Mechanisms without Payments

Sihan Zeng
Sujay Bhatt
Alec Koppel
Sumitra Ganesh

PDF Details

ICLR Conference 2025 Conference Paper

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

Souradip Chakraborty
Sujay Bhatt
Udari Madhushani
Soumya Suvra Ghosal
Jiahao Qiu
Mengdi Wang 0001
Dinesh Manocha
Furong Huang

Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to align LLMs to human preferences, and broader utilities, but it requires updating billions of model parameters which is computationally expensive. Controlled Decoding, by contrast, provides a mechanism for aligning a model at inference time without retraining. However, single-agent decoding approaches often struggle to adapt to diverse tasks due to the complexity and variability inherent in these tasks. To strengthen the test-time performance w.r.t the target task, we propose a mixture of agents-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies. Treating each prior policy as an agent in the spirit of mixture of agent collaboration, we develop a decoding method that allows for inference-time alignment through a token-level selection strategy among multiple agents. For each token, the most suitable LLM is dynamically chosen from a pool of models based on a long-term utility metric. This policy-switching mechanism ensures optimal model selection at each step, enabling efficient collaboration and alignment among LLMs during decoding. Theoretical analysis of our proposed algorithm establishes optimal performance with respect to the target task represented via a target reward, for the given off-the-shelf models. We conduct comprehensive empirical evaluations with open-source aligned models on diverse tasks and preferences, which demonstrates the merits of this approach over single-agent decoding baselines. Notably, COLLAB surpasses the current SoTA decoding strategy, achieving an improvement of {up to 1.56x} in average reward and $71.89\%$ in GPT-4 based win-tie rate.