Author name cluster

Yixuan Even Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

TMLR Journal 2026 Journal Article

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Yixuan Even Xu
Yash Savani
Fei Fang
J Zico Kolter

Reinforcement learning with verifiable rewards (RLVR) has emerged as the leading approach for enhancing reasoning capabilities in large language models. However, it faces a fundamental compute and memory asymmetry: rollout generation is embarrassingly parallel and memory-light, whereas policy updates are communication-heavy and memory-intensive. To address this, we introduce PODS (Policy Optimization with Down-Sampling), which decouples rollout generation from policy updates by training only on a strategically selected subset of rollouts, maintaining learning quality while dramatically reducing update costs. We propose a principled subset selection criterion—max-variance down-sampling—that maximizes the variance of reward in the selected subset, and provide an efficient $O(n\log n)$ implementation of this rule. Empirically, Group Relative Policy Optimization (GRPO) coupled with PODS achieves the peak test accuracy of vanilla GRPO at least $\mathbf{1.7\times}$ faster across the different reasoning benchmarks and hardware configurations we tested.

PDF Details

AAAI Conference 2025 Conference Paper

Deviate or Not: Learning Coalition Structures with Multiple-bit Observations in Games

Yixuan Even Xu
Zhe Feng
Fei Fang

We consider the Coalition Structure Learning (CSL) problem in multi-agent systems, motivated by the existence of coalitions in many real-world systems, e.g., trading platforms and auction systems. In this problem, there is a hidden coalition structure within a set of n agents, which affects the behavior of the agents in games. Our goal is to actively design a sequence of games for the agents to play, such that observations in these games can be used to learn the hidden coalition structure. In particular, we consider the setting where in each round, we design and present a game together with a strategy profile to the agents, and receive a multiple-bit observation -- for each agent, we observe whether or not they would like to deviate from the specified strategy. We show that we can learn the coalition structure in O(log n) rounds if we are allowed to design any normal-form game, matching the information-theoretical lower bound. For practicality, we extend the result to settings where we can only choose games of a specific format, and design algorithms to learn the coalition structure in these settings. For most settings, our complexity matches the theoretical lower bound up to a constant factor.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Improving Community-Participated Patrol for Anti-Poaching

Yufei Wu
Yixuan Even Xu
Xuming Zhang
Duo Liu
Shibing Zhu
Fei Fang

Community engagement plays a critical role in anti-poaching efforts, yet existing mathematical models aimed at enhancing this engagement often overlook direct participation by community members as alternative patrollers. Unlike professional rangers, community members typically lack flexibility and experience, resulting in new challenges in optimizing patrol resource allocation. To address this gap, we propose a novel game-theoretic model for community-participated patrol, where a conservation agency strategically deploys both professional rangers and community members to safeguard wildlife against a best-responding poacher. In addition to a mixed-integer linear program formulation, we introduce a Two-Dimensional Binary Search algorithm and a novel Hybrid Waterfilling algorithm to efficiently solve the game in polynomial time. Through extensive experiments and a detailed case study focused on a protected tiger habitat in Northeast China, we demonstrate the effectiveness of our algorithms and the practical applicability of our model.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Learning Coalition Structures with Games

Yixuan Even Xu
Chun Kai Ling
Fei Fang

Coalitions naturally exist in many real-world systems involving multiple decision makers such as ridesharing, security, and online ad auctions, but the coalition structure among the agents is often unknown. We propose and study an important yet previously overseen problem -- Coalition Structure Learning (CSL), where we aim to carefully design a series of games for the agents and infer the underlying coalition structure by observing their interactions in those games. We establish a lower bound on the sample complexity -- defined as the number of games needed to learn the structure -- of any algorithms for CSL and propose the Iterative Grouping (IG) algorithm for designing normal-form games to achieve the lower bound. We show that IG can be extended to other succinct games such as congestion games and graphical games. Moreover, we solve CSL in a more restrictive and practical setting: auctions. We show a variant of IG to solve CSL in the auction setting even if we cannot design the bidder valuations. Finally, we conduct experiments to evaluate IG in the auction setting and the results align with our theoretical analysis.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Non-excludable Bilateral Trade between Groups

Yixuan Even Xu
Hanrui Zhang
Vincent Conitzer

Bilateral trade is one of the most natural and important forms of economic interaction: A seller has a single, indivisible item for sale, and a buyer is potentially interested. The two parties typically have different, privately known valuations for the item, and ideally, they would like to trade if the buyer values the item more than the seller. The celebrated impossibility result by Myerson and Satterthwaite shows that any mechanism for this setting must violate at least one important desideratum. In this paper, we investigate a richer paradigm of bilateral trade, with many self-interested buyers and sellers on both sides of a single trade who cannot be excluded from the trade. We show that this allows for more positive results. In fact, we establish a dichotomy in the possibility of trading efficiently. If in expectation, the buyers value the item more, we can achieve efficiency in the limit. If this is not the case, then efficiency cannot be achieved in general. En route, we characterize trading mechanisms that encourage truth-telling, which may be of independent interest. We also evaluate our trading mechanisms experimentally, and the experiments align with our theoretical results.

PDF Details DOI