Arrow Research search

Author name cluster

Yuqi Pan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

TMLR Journal 2026 Journal Article

SpikingBrain: Spiking Brain-inspired Large Models

  • Yuqi Pan
  • Yupeng Feng
  • JingHao Zhuang
  • siyu ding
  • Han Xu
  • Zehao Liu
  • Bohan Sun
  • Yuhong Chou

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100Ɨ speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

NeurIPS Conference 2025 Conference Paper

Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing

  • Choo, XianJun, Davin
  • Yuqi Pan
  • Tonghan Wang
  • Milind Tambe
  • Alastair van Heerden
  • Cheryl Johnson

We study a sequential decision-making problem on a $n$-node graph $\mathcal{G}$ where each node has an unknown label from a finite set $\mathbf{\Omega}$, drawn from a joint distribution $\mathcal{P}$ that is Markov with respect to $\mathcal{G}$. At each step, selecting a node reveals its label and yields a label-dependent reward. The goal is to adaptively choose nodes to maximize expected accumulated discounted rewards. We impose a frontier exploration constraint, where actions are limited to neighbors of previously selected nodes, reflecting practical constraints in settings such as contact tracing and robotic exploration. We design a Gittins index-based policy that applies to general graphs and is provably optimal when $\mathcal{G}$ is a forest. Our implementation runs in $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|^2)$ time while using $\mathcal{O}(n \cdot |\mathbf{\Omega}|^2)$ oracle calls to $\mathcal{P}$ and $\mathcal{O}(n^2 \cdot |\mathbf{\Omega}|)$ space. Experiments on synthetic and real-world graphs show that our method consistently outperforms natural baselines, including in non-tree, budget-limited, and undiscounted settings. For example, in HIV testing simulations on real-world sexual interaction networks, our policy detects nearly all positive cases with only half the population tested, substantially outperforming other baselines.

AAMAS Conference 2025 Conference Paper

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

  • Guohjun Xiong
  • Haichuan Wang
  • Yuqi Pan
  • Saptarshi Mandal
  • Sanket Shah
  • Niclas Boehmer
  • Milind Tambe

Restless multi-armed bandits (RMABs) have been highly successful in optimizing sequential resource allocation across many domains. However, in many practical settings with highly scarce resources, where each agent can only receive at most one resource, such as healthcare intervention programs, the standard RMAB framework falls short. To tackle such scenarios, we introduce Finite-Horizon Single-Pull RMABs (SPRMABs), a novel variant in which each arm can only be pulled once. This single-pull constraint introduces additional complexity, rendering many existing RMAB solutions suboptimal or ineffective. To address this shortcoming, we propose using dummy states that expand the system and enforce the one-pull constraint. We then design a lightweight index policy for this expanded system. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of Õ 1 šœŒ1/2 for a finite number of arms, where šœŒ is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

UAI Conference 2025 Conference Paper

Robust Optimization with Diffusion Models for Green Security

  • Lingkai Kong
  • Haichuan Wang
  • Yuqi Pan
  • Cheol Woo Kim
  • Mingxiao Song
  • Alayna Nguyen
  • Tonghan Wang 0001
  • Haifeng Xu

In green security, defenders must forecast adversarial behavior-such as poaching, illegal logging, and illegal fishing-to plan effective patrols. These behavior are often highly uncertain and complex. Prior work has leveraged game theory to design robust patrol strategies to handle uncertainty, but existing adversarial behavior models primarily rely on Gaussian processes or linear models, which lack the expressiveness needed to capture intricate behavioral patterns. To address this limitation, we propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities. To the best of our knowledge, this is the first application of diffusion models in the green security domain. Integrating diffusion models into game-theoretic optimization, however, presents new challenges, including a constrained mixed strategy space and the need to sample from an unnormalized distribution to estimate utilities. To tackle these challenges, we introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling. Theoretically, our algorithm is guaranteed to converge to an \(\epsilon\)-equilibrium with high probability using a finite number of iterations and samples. Empirically, we evaluate our approach on both synthetic and real-world poaching datasets, demonstrating its effectiveness.

NeurIPS Conference 2024 Conference Paper

Contextual Decision-Making with Knapsacks Beyond the Worst Case

  • Zhaohua Chen
  • Rui Ai
  • Mingwei Yang
  • Yuqi Pan
  • Chang Wang
  • Xiaotie Deng

We study the framework of a dynamic decision-making scenario with resource constraints. In this framework, an agent, whose target is to maximize the total reward under the initial inventory, selects an action in each round upon observing a random request, leading to a reward and resource consumptions that are further associated with an unknown random external factor. While previous research has already established an $\widetilde{O}(\sqrt{T})$ worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective: one for the worst-case gap between benchmarks and another for logarithmic regret rates. We first show that an $\Omega(\sqrt{T})$ distance between the commonly used fluid benchmark and the online optimum is unavoidable when the former has a degenerate optimal solution. On the algorithmic side, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an $\widetilde{O}(1)$ regret as long as the fluid LP has a unique and non-degenerate solution. Furthermore, we prove that our algorithm maintains a near-optimal $\widetilde{O}(\sqrt{T})$ regret even in the worst cases and extend these results to the setting where the request and external factor are continuous. Regarding information structure, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.

AAAI Conference 2024 Conference Paper

Dynamic Budget Throttling in Repeated Second-Price Auctions

  • Zhaohua Chen
  • Chang Wang
  • Qian Wang
  • Yuqi Pan
  • Zhuming Shi
  • Zheng Cai
  • Yukun Ren
  • Zhihua Zhu

In today's online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser's total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser's dynamic budget throttling process in repeated second-price auctions. We first establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser's values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

NeurIPS Conference 2024 Conference Paper

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

  • Yuhong Chou
  • Man Yao
  • Kexin Wang
  • Yuqi Pan
  • Ruijie Zhu
  • Yiran Zhong
  • Yu Qiao
  • Jibin Wu

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

AAMAS Conference 2023 Conference Paper

Altruism, Collectivism and Egalitarianism: On a Variety of Prosocial Behaviors in Binary Networked Public Goods Games

  • Jichen Li
  • Xiaotie Deng
  • Yukun Cheng
  • Yuqi Pan
  • Xuanzhi Xia
  • Zongjun Yang
  • Jan Xie

Binary Networked public goods (BNPG) game consists of a network šŗ = (š‘‰, šø) with n players residing as nodes in a network and making a YES/NO decision to invest a public project. Examples of such public projects include face mask wearing during a pandemic, crime reporting and vaccination, etc. Most of the conventional modes of BNPG games solely posit egoism as the motivation of players: they only care about their own benefits. However, a series of real-world examples show that people have a wide range of prosocial behaviors in making decisions. To address this property, we introduce a novel extension of BNPG games to account for three kinds of prosocial motivations: altruism, collectivism, and egalitarianism. We revise utility functions to reflect different prosocial motivations with respect to the welfare of others, mediated by a prosocial graph. We develop computational complexity results to decide the existence of pure strategy Nash equilibrium in these models, for cases where the prosocial graph is a tree, a clique or a general network. We further discuss the Prosocial Network Modification (PNM) problem, in which a principal can change the network structure within a budget constraint, to induce a given strategy profile with respect to an equilibrium. For all three types of PNM problems, we completely characterize their corresponding computational complexity results.