Author name cluster

Quanming Yao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers

2 author rows

AAAI Conference 2026 Conference Paper

Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games

Bingyu Hui
Lebin Yu
Quanming Yao
Yunpeng Qu
Xudong Zhang
Jian Wang

Zero-shot coordination(ZSC), a key challenge in multi-agent game theory, has become a hot topic in reinforcement learning (RL) research recently, especially in complex evolving games. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators from a diverse, potentially evolving, pool of partners that are not seen before without any fine-tuning. Population-based training, which approximates such an evolving partner pool, has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient RL training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi cooperative game and confirms its superiority.

PDF Details DOI

AIJ Journal 2026 Journal Article

Neural knowledge graph reasoning with relational digraph

Yongqi Zhang
Haiquan Qiu
Shuzhi Liu
Enjun Du
Quanming Yao

Details DOI

NeurIPS Conference 2025 Conference Paper

Adaptive Preference Arithmetic: A Personalized Agent with Adaptive Preference Arithmetic for Dynamic Preference Modeling

Hongyi Nie
Yaqing Wang
Mingyang Zhou
Feiyang Pan
Quanming Yao
Zhen Wang

As large language models (LLMs) are increasingly used as personalized user assistants, effectively adapting to users' evolving preferences is critical for delivering high-quality personalized responses. While user preferences are often stable in content, their relative strengths shift over time due to changing goals and contexts. Therefore, modeling these dynamic preference strengths can enable finer-grained personalization. However, current methods face two major challenges: (i) limited user feedback makes it difficult to estimate preference strengths accurately, and (ii) natural language ambiguity limits the controllability of preference-guided generation. To address these issues, we propose AdaPA-Agent, a LLM-agent personalization framework that models dynamic preference strengths via Adaptive Preference Arithmetic. First, instead of requiring additional user feedback, AdaPA-Agent employs an alignment-based strength estimation module to estimate the strength of user preferences from the existing user-agent interaction. Then, it guides controllable personalized generation by linearly combining next-token distributions, weighted by the estimated strengths of individual preferences. Experiments on two personalization tasks-conversational recommendation and personalized web interaction-demonstrate that AdaPA-Agent better aligning with users' changing intents, and has achieved over 18. 9\% and 14. 2\% improvements compared to ReAct, the widely-used agent framework.