Arrow Research search

Author name cluster

Shanda Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization

  • Weiwei Sun
  • Shengyu Feng
  • Shanda Li
  • Yiming Yang

Although LLM-based agents have attracted significant attention in domains such as software engineering and machine learning research, their role in advancing combinatorial optimization (CO) remains relatively underexplored. This gap underscores the need for a deeper understanding of their potential in tackling structured, constraint-intensive problems---a pursuit currently limited by the absence of comprehensive benchmarks for systematic investigation. To address this, we introduce CO-Bench, a benchmark suite featuring 36 real-world CO problems drawn from a broad range of domains and complexity levels. CO-Bench includes structured problem formulations and curated data to support rigorous investigation of LLM agents. We evaluate multiple agentic frameworks against established human-designed algorithms, revealing the strengths and limitations of existing LLM agents and identifying promising directions for future research.

TMLR Journal 2026 Journal Article

CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

  • Shanda Li
  • Tanya Marwah
  • Junhong Shen
  • Weiwei Sun
  • Andrej Risteski
  • Yiming Yang
  • Ameet Talwalkar

Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement and are computationally expensive, while neural-network-based solvers require large training datasets and often lack interpretability. In this work, we frame PDE solving as a code generation task and introduce CodePDE, the first inference framework for generating PDE solvers using large language models (LLMs). With CodePDE, we present a thorough evaluation on critical capacities of LLM for PDE solving: reasoning, debugging, self-refinement, and test-time scaling. CodePDE shows that, with advanced inference-time algorithms and scaling strategies, LLMs can achieve strong performance across a range of representative PDE problems. We also identify novel insights into LLM-driven solver generation, such as trade-offs between solver reliability and sophistication, design principles for LLM-powered PDE solving agents, and failure modes for LLM on hard tasks. These insights offer guidance for building more capable and reliable LLM-based scientific engines.

ICLR Conference 2025 Conference Paper

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

  • Yangzhen Wu
  • Zhiqing Sun
  • Shanda Li
  • Sean Welleck
  • Yiming Yang 0002

While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies such as greedy search, majority voting, best-of-$n$, weighted voting, and two different tree search algorithms, using different model sizes and compute budgets. Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters. Additionally, smaller models combined with advanced inference algorithms offer Pareto-optimal trade-offs in cost and performance. For example, the Llemma-7B model, when paired with our novel tree search algorithm, consistently outperforms the Llemma-34B model across all tested inference strategies on the MATH benchmark. We hope these insights contribute to a deeper understanding of inference scaling laws (test-time scaling laws) for LLMs.

ICML Conference 2025 Conference Paper

Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators

  • Shanda Li
  • Shinjae Yoo
  • Yiming Yang 0002

Fourier Neural Operators (FNOs) offer a principled approach for solving complex partial differential equations (PDEs). However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $\mu$ Transfer-FNO, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs without additional tuning. Building on the Maximal Update Parametrization ($\mu$P) framework, we mathematically derive a parametrization scheme that facilitates the transfer of optimal hyperparameters across models with different numbers of Fourier modes in FNOs, which is validated through extensive experiments on various PDEs. Our empirical study shows that $\mu$Transfer-FNO reduces computational cost for tuning hyperparameters on large FNOs while maintaining or improving accuracy.

ICLR Conference 2025 Conference Paper

TFG-Flow: Training-free Guidance in Multimodal Generative Flow

  • Haowei Lin
  • Shanda Li
  • Haotian Ye
  • Yiming Yang
  • Stefano Ermon
  • Yitao Liang
  • Jianzhu Ma

Given an unconditional generative model and a predictor for a target property (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. As a highly efficient technique for steering generative models toward flexible outcomes, training-free guidance has gained increasing attention in diffusion models. However, existing methods only handle data in continuous spaces, while many scientific applications involve both continuous and discrete data (referred to as multimodality). Another emerging trend is the growing use of the simple and general flow matching framework in building generative foundation models, where guided generation remains under-explored. To address this, we introduce TFG-Flow, a novel training-free guidance method for multimodal generative flow. TFG-Flow addresses the curse-of-dimensionality while maintaining the property of unbiased sampling in guiding discrete variables. We validate TFG-Flow on four molecular design tasks and show that TFG-Flow has great potential in drug design by generating molecules with desired properties.

EAAI Journal 2024 Journal Article

A visual detection algorithm for autonomous driving road environment perception

  • Peichao Cong
  • Hao Feng
  • Shanda Li
  • Tianheng Li
  • Yutao Xu
  • Xin Zhang

Achieving accurate and real-time perception of environmental targets in complex traffic scenes based on visual sensors is a challenging research problem in the field of autonomous driving technology. In methods to date, it is difficult to effectively balance the detection accuracy and speed. To this end, this paper proposes an interactive and lightweight visual detection algorithm – YRDM (Your Region Decision-Making) – based on the concepts of efficient mining and utilisation of target feature information, lightweight network structure, and optimisation of label allocation for highly practical detection of ambient targets in autonomous driving scenarios. First, a two-stage algorithm architecture consisting of four low-parameter subnetworks is constructed with the goal of efficiently mining and utilising target feature information, and the accuracy and effectiveness of the algorithm are balanced through the interaction of information between the subnetworks. Second, in order to further improve the detection speed, lightweight convolution is introduced into the structure of the YRDM network to construct the DSC3 module, which allows lightweight processing of the subnetwork structure. Finally, by converting the label assignment problem into an optimal transport problem, adaptation to the global nature of the samples by YRDM is improved, allowing better detection accuracy. The algorithm is tested with two major public datasets, BDD100K and KITTI, and a large number of experimental results show that the comprehensive performance of YRDM is better than other existing algorithms. In addition, ablation experiments and mobile terminal device deployment experiments further demonstrate the effectiveness and real-time performance of this algorithm.

ICLR Conference 2024 Conference Paper

Functional Interpolation for Relative Positions improves Long Context Transformers

  • Shanda Li
  • Chong You
  • Guru Guruganesh
  • Joshua Ainslie
  • Santiago Ontañón
  • Manzil Zaheer
  • Sumit Sanghai
  • Yiming Yang 0002

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the input sequence lengths it can process, the choice of position encoding used during training can limit the performance of these models on longer inputs. We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We next empirically show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.

NeurIPS Conference 2022 Conference Paper

Is $L^2$ Physics Informed Loss Always Suitable for Training Physics Informed Neural Network?

  • Chuwei Wang
  • Shanda Li
  • Di He
  • Liwei Wang

The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned solution. In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero. With this concept, we study an important class of high-dimensional non-linear PDEs in optimal control, the Hamilton-Jacobi-Bellman (HJB) Equation, and prove that for general $L^p$ Physics-Informed Loss, a wide class of HJB equation is stable only if $p$ is sufficiently large. Therefore, the commonly used $L^2$ loss is not suitable for training PINN on those equations, while $L^{\infty}$ loss is a better choice. Based on the theoretical insight, we develop a novel PINN training algorithm to minimize the $L^{\infty}$ loss for HJB equations which is in a similar spirit to adversarial training. The effectiveness of the proposed algorithm is empirically demonstrated through experiments. Our code is released at https: //github. com/LithiumDA/L_inf-PINN.

NeurIPS Conference 2022 Conference Paper

Your Transformer May Not be as Powerful as You Expect

  • Shengjie Luo
  • Shanda Li
  • Shuxin Zheng
  • Tie-Yan Liu
  • Liwei Wang
  • Di He

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative---RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. One key reason lies in that most RPEs are placed in the softmax attention that always generates a right stochastic matrix. This restricts the network from capturing positional information in the RPEs and limits its capacity. To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions. Therefore, the corresponding URPE-based Transformers become universal function approximators. Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications. The code will be made publicly available at https: //github. com/lsj2408/URPE.

NeurIPS Conference 2021 Conference Paper

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

  • Shengjie Luo
  • Shanda Li
  • Tianle Cai
  • Di He
  • Dinglan Peng
  • Shuxin Zheng
  • Guolin Ke
  • Liwei Wang

The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e. g. , Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $\mathcal{O}(n\log n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime.