Author name cluster

Weizhe Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

TMLR Journal 2025 Journal Article

Solving Multi-agent Path Finding as an LLM Benchmark: How, How Good and Why

Weizhe Chen
Sven Koenig
Bistra Dilkina

The rapid success of large language models (LLMs) has spurred extensive research into their ability to solve a wide range of tasks. However, their potential in multi-agent planning remains underexplored. Multi-agent planning presents unique challenges due to the combined complexity of coordination and long-horizon reasoning, often making it difficult to leverage external tools for assistance. In this paper, we introduce Multi-Agent Path Finding (MAPF), also known as multi-robot route planning, as a novel benchmark for evaluating the reasoning capabilities of LLMs. We first describe how the MAPF benchmark can be adapted for LLM-based evaluation, including dataset curation and an agentic workflow for LLMs. We show the motivating success of single-agent planning and multi-agent pathfinding in an empty room map without obstacles, then the failure to plan on the harder room map and maze map of the standard MAPF benchmark. We present our position on why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis. Based on our results, we discussed how researchers with different backgrounds could help with this problem from different perspectives.

PDF Details

AIJ Journal 2024 Journal Article

An extensive study of security games with strategic informants

Weiran Shen
Minbiao Han
Weizhe Chen
Taoan Huang
Rohit Singh
Haifeng Xu
Fei Fang

Details DOI

IJCAI Conference 2023 Conference Paper

DiSProD: Differentiable Symbolic Propagation of Distributions for Planning

Palash Chatterjee
Ashutosh Chapagain
Weizhe Chen
Roni Khardon

The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Temporal Induced Self-Play for Stochastic Bayesian Games

Weizhe Chen
Zihan Zhou
Yi Wu
Fei Fang

One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existing techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Bi-Level Actor-Critic for Multi-Agent Coordination

Haifeng Zhang
Weizhe Chen
Zeren Huang
Minne Li
Yaodong Yang
Weinan Zhang
Jun Wang

Coordination is one of the essential problems in multiagent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally deﬁne the bi-level reinforcement learning problem in ﬁnding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and ﬁnd a asymmetric solution in a highway merge environment.

PDF Details

IJCAI Conference 2020 Conference Paper

When to Follow the Tip: Security Games with Strategic Informants

Weiran Shen
Weizhe Chen
Taoan Huang
Rohit Singh
Fei Fang

Although security games have attracted intensive research attention over the past years, few existing works consider how information from local communities would affect the game. In this paper, we introduce a new player -- a strategic informant, who can observe and report upcoming attacks -- to the defender-attacker security game setting. Characterized by a private type, the informant has his utility structure that leads to his strategic behaviors. We model the game as a 3-player extensive-form game and propose a novel solution concept of Strong Stackelberg-perfect Bayesian equilibrium. To compute the optimal defender strategy, we first show that although the informant can have infinitely many types in general, the optimal defense plan can only include a finite (exponential) number of different patrol strategies. We then prove that there exists a defense plan with only a linear number of patrol strategies that achieve the optimal defender's utility, which significantly reduces the computational burden and allows us to solve the game in polynomial time using linear programming. Finally, we conduct extensive experiments to show the effect of the strategic informant and demonstrate the effectiveness of our algorithm.

PDF Details DOI