Author name cluster

Yuqi Pan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

TMLR Journal 2026 Journal Article

SpikingBrain: Spiking Brain-inspired Large Models

Yuqi Pan
Yupeng Feng
JingHao Zhuang
siyu ding
Han Xu
Zehao Liu
Bohan Sun
Yuhong Chou

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100× speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

PDF Details

NeurIPS Conference 2025 Conference Paper

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

Choo, XianJun, Davin
Yuqi Pan
Tonghan Wang
Milind Tambe
Alastair van Heerden
Cheryl Johnson

We study a sequential decision-making problem on a $n$-node graph $\mathcal{G}$ where each node has an unknown label from a finite set $\mathbf{\Omega}$, drawn from a joint distribution $\mathcal{P}$ that is Markov with respect to $\mathcal{G}$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $\mathcal{G}$ is a forest. Our implementation runs in $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|^2)$ time while using $\mathcal{O}(n \cdot |\mathbf{\Omega}|^2)$ oracle calls to $\mathcal{P}$ and $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|)$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines.

PDF Details

AAMAS Conference 2025 Conference Paper

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

Guohjun Xiong
Haichuan Wang
Yuqi Pan
Saptarshi Mandal
Sanket Shah
Niclas Boehmer
Milind Tambe

Restless multi-armed bandits (RMABs) have been highly successful in optimizing sequential resource allocation across many domains. However, in many practical settings with highly scarce resources, where each agent can only receive at most one resource, such as healthcare intervention programs, the standard RMAB framework falls short. To tackle such scenarios, we introduce Finite-Horizon Single-Pull RMABs (SPRMABs), a novel variant in which each arm can only be pulled once. This single-pull constraint introduces additional complexity, rendering many existing RMAB solutions suboptimal or ineffective. To address this shortcoming, we propose using dummy states that expand the system and enforce the one-pull constraint. We then design a lightweight index policy for this expanded system. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of Õ 1 𝜌1/2 for a finite number of arms, where 𝜌 is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

PDF

UAI Conference 2025 Conference Paper

Robust Optimization with Diffusion Models for Green Security

Lingkai Kong
Haichuan Wang
Yuqi Pan
Cheol Woo Kim
Mingxiao Song
Alayna Nguyen
Tonghan Wang 0001
Haifeng Xu

In green security, defenders must forecast adversarial behavior-such as poaching, illegal logging, and illegal fishing-to plan effective patrols. These behavior are often highly uncertain and complex. Prior work has leveraged game theory to design robust patrol strategies to handle uncertainty, but existing adversarial behavior models primarily rely on Gaussian processes or linear models, which lack the expressiveness needed to capture intricate behavioral patterns. To address this limitation, we propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities. To the best of our knowledge, this is the first application of diffusion models in the green security domain. Integrating diffusion models into game-theoretic optimization, however, presents new challenges, including a constrained mixed strategy space and the need to sample from an unnormalized distribution to estimate utilities. To tackle these challenges, we introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling. Theoretically, our algorithm is guaranteed to converge to an $\epsilon$-equilibrium with high probability using a finite number of iterations and samples. Empirically, we evaluate our approach on both synthetic and real-world poaching datasets, demonstrating its effectiveness.

Details

NeurIPS Conference 2024 Conference Paper

Contextual Decision-Making with Knapsacks Beyond the Worst Case

Zhaohua Chen
Rui Ai
Mingwei Yang
Yuqi Pan
Chang Wang
Xiaotie Deng

We study the framework of a dynamic decision-making scenario with resource constraints. In this framework, an agent, whose target is to maximize the total reward under the initial inventory, selects an action in each round upon observing a random request, leading to a reward and resource consumptions that are further associated with an unknown random external factor. While previous research has already established an $\widetilde{O}(\sqrt{T})$ worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective: one for the worst-case gap between benchmarks and another for logarithmic regret rates. We first show that an $\Omega(\sqrt{T})$ distance between the commonly used fluid benchmark and the online optimum is unavoidable when the former has a degenerate optimal solution. On the algorithmic side, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an $\widetilde{O}(1)$ regret as long as the fluid LP has a unique and non-degenerate solution. Furthermore, we prove that our algorithm maintains a near-optimal $\widetilde{O}(\sqrt{T})$ regret even in the worst cases and extend these results to the setting where the request and external factor are continuous. Regarding information structure, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Dynamic Budget Throttling in Repeated Second-Price Auctions

Zhaohua Chen
Chang Wang
Qian Wang
Yuqi Pan
Zhuming Shi
Zheng Cai
Yukun Ren
Zhihua Zhu

In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jibin Wu

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Altruism, Collectivism and Egalitarianism: On a Variety of Prosocial Behaviors in Binary Networked Public Goods Games

Jichen Li
Xiaotie Deng
Yukun Cheng
Yuqi Pan
Xuanzhi Xia
Zongjun Yang
Jan Xie

Binary Networked public goods (BNPG) game consists of a network 𝐺 = (𝑉, 𝐸) with n players residing as nodes in a network and making a YES/NO decision to invest a public project. Examples of such public projects include face mask wearing during a pandemic, crime reporting and vaccination, etc. Most of the conventional modes of BNPG games solely posit egoism as the motivation of players: they only care about their own benefits. However, a series of real-world examples show that people have a wide range of prosocial behaviors in making decisions. To address this property, we introduce a novel extension of BNPG games to account for three kinds of prosocial motivations: altruism, collectivism, and egalitarianism. We revise utility functions to reflect different prosocial motivations with respect to the welfare of others, mediated by a prosocial graph. We develop computational complexity results to decide the existence of pure strategy Nash equilibrium in these models, for cases where the prosocial graph is a tree, a clique or a general network. We further discuss the Prosocial Network Modification (PNM) problem, in which a principal can change the network structure within a budget constraint, to induce a given strategy profile with respect to an equilibrium. For all three types of PNM problems, we completely characterize their corresponding computational complexity results.

PDF