Arrow Research search

Author name cluster

Cheng Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

  • Cheng Yan
  • Wuyang Zhang
  • Zhiyuan Ning
  • Fan Xu
  • Ziyang Tao
  • Lu Zhang
  • Bing Yin
  • Yanyong Zhang

The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.

JBHI Journal 2025 Journal Article

Drug Repositioning Based on Expert Knowledge Augmented Graph Neural Network

  • Zhenpeng Wu
  • Cheng Yan
  • Jiamin Chen
  • Siyang Xiao
  • Jianliang Gao

Drug repositioning is critical in accelerating drug discovery, which identifies new indications for existing drugs by modeling drug–disease associations. Compared to traditional methods, graph neural networks (GNNs) have recently gained widespread attention due to their ability to effectively aggregate information from neighboring nodes in drug–disease heterogeneous graphs. The GNN-based methods need effective node embeddings for information aggregation. However, they generate the node embeddings by random initialization, rather than incorporating the high-quality expert knowledge involving biological mechanisms in the databases. This limits their capacity to generate interpretable node embeddings aligned with expert knowledge. To bridge this gap, we develop a novel framework dubbed DReKGNN ( D rug Re positioning based on expert K nowledge augmented G raph N eural N etwork). To be specific, DReKGNN will first adopt large language models (LLMs) as a semantic bridge between expert knowledge and GNNs. To ensure the accuracy of expert knowledge, DReKGNN does not rely on prompt templates in LLMs to generate knowledge descriptions for drugs and diseases. Instead, it extracts expert knowledge directly from the DrugBank and OMIM databases. The effective node embeddings with interpretable semantic information will be generated from expert knowledge descriptions involving biological mechanisms by LLMs. Then, we demonstrate that there is a need to mitigate noise when LLM node embeddings serving drug repositioning prediction tasks. Considering this design need, we integrate GNNs with LLM node embeddings by a mean aggregation strategy. The experiment results of performance comparison and case study show the effectiveness of DReKGNN in predicting drug–disease associations. The code is available at https://github.com/csubigdata-Organization/DReKGNN.

ICML Conference 2025 Conference Paper

Information Bottleneck-guided MLPs for Robust Spatial-temporal Forecasting

  • Min Chen
  • Guansong Pang
  • Wenjun Wang 0002
  • Cheng Yan

Spatial-temporal forecasting (STF) plays a pivotal role in urban planning and computing. Spatial-Temporal Graph Neural Networks (STGNNs) excel at modeling spatial-temporal dynamics, thus being robust against noise perturbations. However, they often suffer from relatively poor computational efficiency. Simplifying the architectures can improve efficiency but also weakens robustness with respect to noise interference. In this study, we investigate the problem: can simple neural networks such as Multi-Layer Perceptrons (MLPs) achieve robust spatial-temporal forecasting while remaining efficient? To this end, we first reveal the dual noise effect in spatial-temporal data and propose a theoretically grounded principle termed Robust Spatial-Temporal Information Bottleneck (RSTIB), which holds strong potential for improving model robustness. We then design an implementation named RSTIB-MLP, together with a new training regime incorporating a knowledge distillation module, to enhance the robustness of MLPs for STF while maintaining their efficiency. Comprehensive experiments demonstrate that RSTIB-MLP achieves an excellent trade-off between robustness and efficiency, outperforming state-of-the-art STGNNs and MLP-based models. Our code is publicly available at: https: //github. com/mchen644/RSTIB.

NeurIPS Conference 2025 Conference Paper

LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought

  • Cheng Yan
  • Felix Mohr
  • Tom Viering

Sample-wise learning curves plot performance versus training set size. They are useful for studying scaling laws and speeding up hyperparameter tuning and model selection. Learning curves are often assumed to be well-behaved: monotone (i. e. improving with more data) and convex. By constructing the Learning Curves Database 1. 1 (LCDB 1. 1), a large-scale database with high-resolution learning curves including more modern learners (CatBoost, TabNet, RealMLP, and TabPFN), we show that learning curves are less often well-behaved than previously thought. Using statistically rigorous methods, we observe significant ill-behavior in approximately 15% of the learning curves, almost twice as much as in previous estimates. We also identify which learners are to blame and show that specific learners are more ill-behaved than others. Additionally, we demonstrate that different feature scalings rarely resolve ill-behavior. We evaluate the impact of ill-behavior on downstream tasks, such as learning curve fitting and model selection, and find it poses significant challenges, underscoring the relevance and potential of LCDB 1. 1 as a challenging benchmark for future research.

NeurIPS Conference 2021 Conference Paper

Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection

  • Jingjing Li
  • Wei Ji
  • Qi Bi
  • Cheng Yan
  • Miao Zhang
  • Yongri Piao
  • Huchuan Lu
  • Li Cheng

Training saliency detection models with weak supervisions, e. g. , image-level tags or captions, is appealing as it removes the costly demand of per-pixel annotations. Despite the rapid progress of RGB-D saliency detection in fully-supervised setting, it however remains an unexplored territory when only weak supervision signals are available. This paper is set to tackle the problem of weakly-supervised RGB-D salient object detection. The key insight in this effort is the idea of maintaining per-pixel pseudo-labels with iterative refinements by reconciling the multimodal input signals in our joint semantic mining (JSM). Considering the large variations in the raw depth map and the lack of explicit pixel-level supervisions, we propose spatial semantic modeling (SSM) to capture saliency-specific depth cues from the raw depth and produce depth-refined pseudo-labels. Moreover, tags and captions are incorporated via a fill-in-the-blank training in our textual semantic modeling (TSM) to estimate the confidences of competing pseudo-labels. At test time, our model involves only a light-weight sub-network of the training pipeline, i. e. , it requires only an RGB image as input, thus allowing efficient inference. Extensive evaluations demonstrate the effectiveness of our approach under the weakly-supervised setting. Importantly, our method could also be adapted to work in both fully-supervised and unsupervised paradigms. In each of these scenarios, superior performance has been attained by our approach with comparing to the state-of-the-art dedicated methods. As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.