Author name cluster

Kai Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

BioDPP: Dynamic Prompt Policy Learning for Biomedical Vision-Language Models

Pingyi Miao
Xianlai Chen
Kai Sun
Yunbo Wang
Shuang Zhao
Ying An

Foundational vision-language models (VLMs), such as CLIP, are emerging as a promising paradigm in vision tasks due to their strong generalization ability. Nevertheless, adapting them to downstream tasks remains challenging, especially in biomedical imaging, where scarce annotations, low-contrast features and complex patterns hinder model adaptation. Thus, prompt tuning is employed to facilitate the adaptation of VLMs. However, current prompt tuning methods like Context Optimization (CoOp) mainly learn a single yet static prompt which is applied to all images, and such one-size-fits-all prompt cannot describe the case-specific diagnostic cues in biomedical data, compromising the adaptation of VLMs. To this end, we propose a Dynamic Prompt Policy learning method that enables efficient adaptation of Biomedical VLMs (BioDPP) for accurate and highly generalizable few-shot biomedical image classification. Specifically, we conceptualize the learnable context as an agent, and present a paradigm of learning a dynamic prompting policy, rather than obtaining a single yet static prompt. Wherein, a dual-reward mechanism is developed to guide policy learning via the feedback on both classification decision and the consistency between the prompt and the context, steering the agent to generate context-aware prompts. Moreover, we devise adaptive baseline stabilization to dynamically regulate reward advantage value throughout the training process, enabling policy refinement in a complex reward space tailored to biomedical VLMs. Extensive experiments are conducted on 10 biomedical datasets, and the results reveal that our BioDPP achieves superior performance, demonstrating more efficient prompt optimization in biomedical VLMs.

PDF Details DOI

EAAI Journal 2026 Journal Article

Development of a physics-guided bidirectional long short-term memory for wind power forecasting

Kai Sun
Dongzhe Yang
Dasong Wang
Fangfang Zhang

Details DOI

EAAI Journal 2026 Journal Article

Dynamic heuristic phased Double Deep Q Network path planning algorithm based on Gaussian mixture regression in discrete traffic environment

Ruixin Zhang
Qing Xu
Kai Sun
Zhilin He
Yi Liu
Youneng Su
Jingyi Wang
Xinming Zhu

Details DOI

AAAI Conference 2026 Conference Paper

Enhancing Pre-training Data Detection in LLMs Through Discriminative and Symmetric Prefix Selection

Kai Sun
Yuxin Lin
Bo Dong
Jingyao Zhang
Bin Shi

The rapid development of large language models (LLMs) has relied on access to high-quality, large-scale datasets, yet growing concerns around data privacy and security have spurred substantial research into pre-training data detection. While state-of-the-art (SOTA) methods such as RECALL and CON-RECALL leverage auxiliary prefixes to enhance detection performance, their dependence on individual prefixes introduces notable instability across varying prefix conditions. To address this, we first conduct a theoretical analysis to assess the impact of prefixes on existing prefix-based methods. Building on the analysis, we propose a novel prefix selection method to identify optimal prefixes. Specifically, our method derives two key criteria Discriminability and Symmetry. These criteria serve to quantify the effectiveness of prefixes in detecting pre-training data, enabling precise selection of high-performing candidate prefixes. Experiments on the WikiMIA dataset demonstrate that our method consistently improves the performance of RECALL and CON-RECALL, achieving gains of up to 21.1% in AUC scores while significantly enhancing robustness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Scope Delineation Before Localization: A Two-Stage Framework for Enhancing Failure Attribution in Multi-Agent Systems

Kai Sun
Wenqiang Li
Bo Dong
Yuxin Lin
Jingyao Zhang
Bin Shi

Large language models (LLMs) are seeing growing adoption in multi-agent systems. In these systems, efficient failure attribution is critical for ensuring robustness and interpretability. Current LLM-based attribution methods often face challenges with lengthy logs and lacking expert knowledge. Drawing inspiration from human debugging strategies, we propose an automated failure attribution framework, Scope Delineation Before Localization, which operates in two key stages: (1) identifying the failure scope and (2) pinpointing the failure step. By decoupling failure attribution into the two stages, our approach alleviates the reasoning workload of LLMs, enabling more precise failure attribution. To support scope delineation, we further introduce two strategies: Stepwise Scope Delineation and Expertise-Assisted Scope Delineation. Experiments on the Who&When dataset validate the efficacy of our two-stage framework, demonstrating substantial improvements over prior methods (up to 24.27% on step-level accuracy).

PDF Details DOI

ECAI Conference 2025 Conference Paper

CogniSNN: An Exploration to Random Graph Architecture Based Spiking Neural Networks with Enhanced Depth-Scalability and Path-Plasticity

Yongsheng Huang
Peibo Duan
Zhipeng Liu
Kai Sun
Changsheng Zhang 0001
Bin Zhang 0001
Mingkun Xu

Currently, most spiking neural networks (SNNs) still mimic the chain-like hierarchical architecture in traditional artificial neural networks (ANNs). This method significantly differs from random connections between neurons found in biological brains, limiting the ability to model the evolving mechanisms of neural pathways in biological neural systems, particularly in terms of dynamic depth-scalability and adaptive path-plasticity. This paper develops a new modeling paradigm for SNNs with random graph architecture (RGA), termed Cognition-aware SNN (CogniSNN). Furthermore, we model the depth-scalability and path-plasticity in CogniSNN by introducing a modified spiking residual neural node (ResNode) to counteract network degradation in deeper graph pathways, as well as a critical path-based algorithm that enables CogniSNN to perform path reusability on new tasks leveraging the features of the data and the RGA learned in old tasks. Experiments show that the performance of CogniSNN with redesigned ResNode is comparable, even superior, to current state-of-the-art SNNs on neuromorphic datasets. The critical path-based approach effectively achieves path reuse capability while maintaining expected performance in learning new tasks that are similar to or distinct from the old ones. This study showcases the potential of RGA-based SNNs and paves a new path for modeling the fusion of computational neuroscience and deep intelligent agents. The code is available at github. com/Yongsheng124/CogniSNN.

Details

IJCAI Conference 2025 Conference Paper

ILIF: Temporal Inhibitory Leaky Integrate-and-Fire Neuron for Overactivation in Spiking Neural Networks

Kai Sun
Peibo Duan
Levin Kuhlmann
Beilun Wang
Bin Zhang

The Spiking Neural Network (SNN) has drawn increasing attention for its energy-efficient, event-driven processing and biological plausibility. To train SNNs via backpropagation, surrogate gradients are used to approximate the non-differentiable spike function, but they only maintain nonzero derivatives within a narrow range of membrane potentials near the firing threshold—referred to as the surrogate gradient support width gamma. We identify a major challenge, termed the dilemma of gamma: a relatively large gamma leads to overactivation, characterized by excessive neuron firing, which in turn increases energy consumption, whereas a small gamma causes vanishing gradients and weakens temporal dependencies. To address this, we propose a temporal Inhibitory Leaky Integrate-and-Fire (ILIF) neuron model, inspired by biological inhibitory mechanisms. This model incorporates interconnected inhibitory units for membrane potential and current, effectively mitigating overactivation while preserving gradient propagation. Theoretical analysis demonstrates ILIF’s effectiveness in overcoming the gamma dilemma, and extensive experiments on multiple datasets show that ILIF improves energy efficiency by reducing firing rates, stabilizes training, and enhances accuracy. The code is available at github. com/kaisun1/ILIF.

PDF Details DOI

AIIM Journal 2025 Journal Article

Medical multimodal foundation models in clinical diagnosis and treatment: Applications, challenges, and future directions

Kai Sun
Siyan Xue
Fuchun Sun
Haoran Sun
Yu Luo
Ling Wang
Siyuan Wang
Na Guo

Details DOI

ICRA Conference 2025 Conference Paper

Multi-Segment Soft Robot Control Via Deep Koopman-Based Model Predictive Control

Lei Lv
Lei Liu 0076
Lei Bao
Fuchun Sun 0001
Jiahong Dong
Jianwei Zhang 0001
Xuemei Shan
Kai Sun

Soft robots, compared to regular rigid robots, as their multiple segments with soft materials bring flexibility and compliance, have the advantages of safe interaction and dexterous operation in the environment. However, due to its characteristics of high dimensional, nonlinearity, time-varying nature, and infinite degree of freedom, it has been challenges in achieving precise and dynamic control such as trajectory tracking and position reaching. To address these challenges, we propose a framework of Deep Koopman-based Model Predictive Control (DK-MPC) for handling multi-segment soft robots. We first employ a deep learning approach with sampling data to approximate the Koopman operator, which therefore linearizes the high-dimensional nonlinear dynamics of the soft robots into a finite-dimensional linear representation. Secondly, this linearized model is utilized within a model predictive control framework to compute optimal control inputs that minimize the tracking error between the desired and actual state trajectories. The real-world experiments on the soft robot “Chordata” demonstrate that DK-MPC could achieve highprecision control, showing the potential of DK-MPC for future applications to soft robots. More visualization results can be found at https://pinkmoon-io.github.io/DKMPC/.

Details

ICML Conference 2025 Conference Paper

Slimming the Fat-Tail: Morphing-Flow for Adaptive Time Series Modeling

Tianyu Liu
Kai Sun
Fuchun Sun 0001
Yu Luo
Yuanlong Zhang

Temporal sequences, even after stationarization, often exhibit leptokurtic distributions with fat tails and persistent distribution shifts. These properties destabilize feature dynamics, amplify model variance, and hinder model convergence in time series forecasting. To address this, we propose Morphing-Flow (MoF), a framework that combines a spline-based transform layer (Flow) and a test-time-trained method (Morph), which adaptively normalizes non-stationary, fat-tailed distributions while preserving critical extreme features. MoF ensures that inputs remain within a network’s effective activation space—a structured, normal-like distribution—even under distributional drift. Experiments across eight datasets show that MoF achieves state-of-the-art performance: With a simple linear backbone architecture, it matches the performance of state-of-the-art models on datasets such as Electricity and ETTh2. When paired with a patch-based Mamba architecture, MoF outperforms its closest competitor by 6. 3% on average and reduces forecasting errors in fat-tailed datasets such as Exchange by 21. 7%. Moreover, MoF acts as a plug-and-play module, boosting performance in existing models without architectural changes.

Details

AAAI Conference 2025 Conference Paper

VERO: Verification and Zero-Shot Feedback Acquisition for Few-Shot Multimodal Aspect-Level Sentiment Classification

Kai Sun
Hao Wu
Bin Shi
Samuel Mensah
Peng Liu
Bo Dong

Deep learning approaches for multimodal aspect-level sentiment classification (MALSC) often require extensive data, which is costly and time-consuming to obtain. To mitigate this, current methods typically fine-tune small-scale pretrained models like BERT and BART with few-shot examples. While these models have shown success, Large Vision-Language Models (LVLMs) offer significant advantages due to their greater capacity and ability to understand nuanced language in both zero-shot and few-shot settings. However, there is limited work on fine-tuning LVLMs for MALSC. A major challenge lies in selecting few-shot examples that effectively capture the underlying patterns in data for these LVLMs. To bridge this research gap, we propose an acquisition function designed to select challenging samples for the few-shot learning of LVLMs for MALSC. We compare our approach, Verification and ZERO-shot feedback acquisition (VERO), with diverse acquisition functions for few-shot learning in MALSC. Our experiments show that VERO outperforms prior methods, achieving an F1 score improvement of up to 6.07% on MALSC benchmark datasets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu
Deqing Fu
Kai Sun
Yi Lu
Zhaojiang Lin
Seungwhan Moon
Kanika Narang
Mustafa Canim

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible and generalizable for multimodal recommendation. We hypothesize that a user's visual history --- comprising images from daily life --- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10\% on Hit@3, and outperforms GPT-4o by 2-5\%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

PDF Details

NeurIPS Conference 2024 Conference Paper

CRAG - Comprehensive RAG Benchmark

Xiao Yang
Kai Sun
Hao Xin
Yushi Sun
Nikita Bhalla
Xiangsen Chen
Sajal Choudhary
Rongze D. Gui

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)’s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4, 409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve $\le 34\%$ accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https: //github. com/facebookresearch/CRAG/.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning

Jiapu Wang
Kai Sun
Linhao Luo
Wei Wei
Yongli Hu
Alan W. Liew
Shirui Pan
Baocai Yin

Temporal Knowledge Graph Reasoning (TKGR) is the process of utilizing temporal information to capture complex relations within a Temporal Knowledge Graph (TKG) to infer new knowledge. Conventional methods in TKGR typically depend on deep learning algorithms or temporal logical rules. However, deep learning-based TKGRs often lack interpretability, whereas rule-based TKGRs struggle to effectively learn temporal rules that capture temporal patterns. Recently, Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning. Consequently, the employment of LLMs for Temporal Knowledge Graph Reasoning (TKGR) has sparked increasing interest among researchers. Nonetheless, LLMs are known to function as black boxes, making it challenging to comprehend their reasoning process. Additionally, due to the resource-intensive nature of fine-tuning, promptly updating LLMs to integrate evolving knowledge within TKGs for reasoning is impractical. To address these challenges, in this paper, we propose a Large Language Models-guided Dynamic Adaptation (LLM-DA) method for reasoning on TKGs. Specifically, LLM-DA harnesses the capabilities of LLMs to analyze historical data and extract temporal logical rules. These rules unveil temporal patterns and facilitate interpretable reasoning. To account for the evolving nature of TKGs, a dynamic adaptation strategy is proposed to update the LLM-generated rules with the latest events. This ensures that the extracted rules always incorporate the most recent knowledge and better generalize to the predictions on future events. Experimental results show that without the need of fine-tuning, LLM-DA significantly improves the accuracy of reasoning over several common datasets, providing a robust framework for TKGR tasks.

PDF Details DOI

ICML Conference 2024 Conference Paper

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to K-Level Stochastic Optimizations

Xiaokang Pan
Xingyu Li
Jin Liu
Tao Sun
Kai Sun
Lixing Chen
Zhe Qu

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

Details

AAAI Conference 2021 Conference Paper

Progressive Multi-task Learning with Controlled Information Flow for Joint Entity and Relation Extraction

Kai Sun
Richong Zhang
Samuel Mensah
Yongyi Mao
Xudong Liu

Multitask learning has shown promising performance in learning multiple related tasks simultaneously, and variants of model architectures have been proposed, especially for supervised classification problems. One goal of multitask learning is to extract a good representation that sufficiently captures the relevant part of the input about the output for each learning task. To achieve this objective, in this paper we design a multitask learning architecture based on the observation that correlations exist between outputs of some related tasks (e. g. entity recognition and relation extraction tasks), and they reflect the relevant features that need to be extracted from the input. As outputs are unobserved, our proposed model exploits task predictions in lower layers of the neural model, also referred to as early predictions in this work. But we control the injection of early predictions to ensure that we extract good task-specific representations for classification. We refer to this model as a Progressive Multitask learning model with Explicit Interactions (PMEI). Extensive experiments on multiple benchmark datasets produce state-of-the-art results on the joint entity and relation extraction task.

PDF Details

JBHI Journal 2021 Journal Article

Transfer Learning for Nonrigid 2D/3D Cardiovascular Images Registration

Shaoya Guan
Tianmiao Wang
Kai Sun
Cai Meng

Cardiovascular image registration is an essential approach to combine the advantages of preoperative 3D computed tomography angiograph (CTA) images and intraoperative 2D X-ray/digital subtraction angiography (DSA) images together in minimally invasive vascular interventional surgery (MIVI). Recent studies have shown that convolutional neural network (CNN) regression model can be used to register these two modality vascular images with fast speed and satisfactory accuracy. However, CNN regression model trained by tens of thousands of images of one patient is often unable to be applied to another patient due to the large difference and deformation of vascular structure in different patients. To overcome this challenge, we evaluate the ability of transfer learning (TL) for the registration of 2D/3D deformable cardiovascular images. Frozen weights in the convolutional layers were optimized to find the best common feature extractors for TL. After TL, the training data set size was reduced to 200 for a randomly selected patient to get accurate registration results. We compared the effectiveness of our proposed nonrigid registration model after TL with not only that without TL but also some traditional intensity-based methods to evaluate that our nonrigid model after TL performs better on deformable cardiovascular image registration.

Details DOI

AAAI Conference 2020 Conference Paper

Relation Extraction with Convolutional Network over Learnable Syntax-Transport Graph

Kai Sun
Richong Zhang
Yongyi Mao
Samuel Mensah
Xudong Liu

A large majority of approaches have been proposed to leverage the dependency tree in the relation classiﬁcation task. Recent works have focused on pruning irrelevant information from the dependency tree. The state-of-the-art Attention Guided Graph Convolutional Networks (AGGCNs) transforms the dependency tree into a weighted-graph to distinguish the relevance of nodes and edges for relation classiﬁcation. However, in their approach, the graph is fully connected, which destroys the structure information of the original dependency tree. How to effectively make use of relevant information while ignoring irrelevant information from the dependency trees remains a challenge in the relation classi- ﬁcation task. In this work, we learn to transform the dependency tree into a weighted graph by considering the syntax dependencies of the connected nodes and persisting the structure of the original dependency tree. We refer to this graph as a syntax-transport graph. We further propose a learnable syntax-transport attention graph convolutional network (LST- AGCN) which operates on the syntax-transport graph directly to distill the ﬁnal representation which is sufﬁcient for classiﬁcation. Experiments on Semeval-2010 Task 8 and Tacred show our approach outperforms previous methods.

PDF Details