Author name cluster

Yang Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

169 papers

2 author rows

AAAI Conference 2026 Conference Paper

Deeply Seeking Boundary for Lunar Regolith Segmentation

Yifeng Wang
Lingxin Wang
Lu Zhang
Yang Li
Chao Xu
Weiwei Zhang
Junyue Tang
Yanhong Zheng

The sharp, intricate contours of lunar regolith particles hold critical clues to the Moon's geological evolution and inform engineering applications from habitat construction to spacecraft design, making their precise segmentation a task of significant scientific and engineering value. However, this task exposes a weakness in deep learning models known as spectral bias, an inherent tendency to learn smooth, low-frequency functions which causes them to systematically erase the very high-frequency boundary details that are of primary interest. To resolve this conflict, we propose a framework to deeply seek object boundaries. First, we propose High-Frequency Initialized LoRA (HiFi-LoRA) to counteract spectral bias. By initializing the LoRA adaptation matrices as the optimal low-rank approximation of a high-pass filter, it fundamentally enhances the model's high-frequency perception and injects a strong preference for edges. Second, we propose the Wavelet Energy Modulation (WEM) regularizer. It guides the model to learn the intrinsic correlation between contour complexity and mask area, forcing the model to build a geometric understanding of contour morphology upon its high-frequency perception, thereby enabling the generation of boundary details commensurate with the object's scale. Experimentally, we constructed the Lunar Regolith Segmentation Dataset (LRSD), the first large-scale benchmark with expert-annotated contours. Extensive experiments demonstrate that our method sets a new state of the art on this challenging benchmark, not only achieving top performance on regional metrics like mIoU and DSC but, more critically, drastically outperforming existing models on boundary accuracy. This work not only provides a powerful computational tool for lunar science but also offers a robust and synergistic design pattern for other fine-grained segmentation challenges.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DHCM-CACL: Dynamic Hierarchical Cross-modal Mamba with Confidence-Adaptive Contrastive Learning for Multimodal Emotion Recognition

Baiqiang Wu
Yang Li

Multimodal emotion recognition plays a crucial role in enhancing the intelligence of human-computer interaction and emotional understanding. However, conventional approaches face challenges such as scarcity of annotated data, significant modality heterogeneity, and temporal misalignment. To address these issues, we propose DHCM-CACL, a novel self-supervised emotion recognition framework integrating EEG and facial expressions. During the pre-training phase, we propose a Dynamic Hierarchical Cross-modal Mamba module (DHCM), which models long-term dependencies through dynamic state matrices, incorporates forgetting gates for noise suppression, and constructs a hierarchical cross-modal interaction structure, effectively achieving cross-modal temporal alignment and mitigating modality heterogeneity. Subsequently, we propose a Confidence-Adaptive Contrastive Learning module (CACL) that dynamically adjusts sample weights using gated confidence signals derived from DHCM to compute loss, prioritizing reliable samples while suppressing noisy instances through adaptive weighting, thereby enhancing representation reliability and generalization in data-scarce scenarios. During the fine-tuning phase, we integrate a cross-modal attention gating mechanism to reinforce temporal associations and adopt an evidence-aware joint optimization objective, providing probabilistic credibility outputs for emotion prediction. Experimental results on the DEAP and MAHNOB-HCI datasets demonstrate that our approach achieves state-of-the-art performance in emotion classification under both subject-dependent and subject-independent settings.

PDF Details DOI

JBHI Journal 2026 Journal Article

DTQFL: A Digital Twin-Assisted Quantum Federated Learning Algorithm for Intelligent Diagnosis in 5G Mobile Network

Zhiguo Qu
Yang Li
Bo Liu
Deepak Gupta
Prayag Tiwari

Smart healthcare aims to revolutionize medical services by integrating artificial intelligence (AI). The limitations of classical machine learning include privacy concerns that prevent direct data sharing among medical institutions, untimely updates, and long training times. To address these issues, this study proposes a digital twin-assisted quantum federated learning algorithm (DTQFL). By leveraging the 5G mobile network, digital twins (DT) of patients can be created instantly using data from various Internet of Medical Things (IoMT) devices and simultaneously reduce communication time in federated learning (FL) at the same time. DTQFL generates DT for patients with specific diseases, allowing for synchronous training and updating of the variational quantum neural network (VQNN) without disrupting the VQNN in the real world. This study utilized DTQFL to train its own personalized VQNN for each hospital, considering privacy security and training speed. Simultaneously, the personalized VQNN of each hospital was obtained through further local iterations of the final global parameters. The results indicate that DTQFL can train a good VQNN without collecting local data while achieving accuracy comparable to that of data-centralized algorithms. In addition, after personalized training, the VQNN can achieve higher accuracy than that without personalized training.

Details DOI

EAAI Journal 2026 Journal Article

End-to-end railway obstacle detection enhanced by point cloud segmentation

Yuxing Yang
Bowen Zhang
Boyu Yang
Kaizhong Xiao
Xiaolong Tuo
Yang Li
Liewei Wang
Siyue Yu

Details DOI

AAAI Conference 2026 Conference Paper

From Imitation to Discrimination: Toward a Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Changpeng Yang
Jinyang Wu
Yuchen Liu
Shuai Zhang
Yang Li
Qiliang Liang
Hongzhen Wang
Shuai Nie

Reinforcement learning has emerged as a paradigm for post-training large language models, boosting their reasoning capabilities. Such approaches compute an advantage value for each sample, reflecting better or worse performance than expected, thereby yielding both positive and negative signals for training. However, existing approaches often mix them indiscriminately, especially in the early stages, leading to ambiguous guidance and limited gains. To address this issue, we propose CAPO (Curriculum Advantage Policy Optimization), an adaptive curriculum mechanism based on advantage signals. The proposed mechanism bootstraps imitation learning with positive-only advantage samples to establish robust foundations, and subsequently introduces negative signals to cultivate discriminative capabilities, thereby improving generalization across complex scenarios. Compatible with diverse optimization methods including GRPO, PPO, RLOO, and Reinforce++, our method consistently achieves stable and significant improvements in mathematical reasoning tasks, and further generalizes effectively to multimodal Graphical User Interface (GUI) reasoning scenarios, establishing itself as a versatile and robust optimization paradigm.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FVNet: Harnessing Liquid Neural Dynamics for Lightweight Visual Representation

Zhenzhe Hou
Xiaohui Chu
Runze Hu
Yang Li
Yutao Liu

Efficient visual backbone design remains crucial for resource-constrained computer vision applications. Inspired by the adaptive continuous-time dynamics observed in biological neurons, we propose FVNet, a novel lightweight architecture that integrates liquid neural dynamics for efficient and dynamic visual feature extraction. Central to FVNet is the Fluid Temporal Flow Unit (FTFU), which employs continuous-time equations with learnable time constants to capture spatio-temporal dependencies adaptively. By further stacking these units in a Multi-Phase Fluid Block (MPFB), our model processes features across parallel temporal scales, enabling context-aware feature encoding without incurring excessive computational overhead. Through a discrete closed-form solution, FVNet achieves the representational power of continuous-time models while avoiding the instability and overhead of iterative numerical solvers. Extensive experiments on various vision tasks demonstrate that FVNet achieves superior performance and efficiency over existing state-of-the-art lightweight networks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

GT2-GS: Geometry-aware Texture Transfer for Gaussian Splatting

Wenjie Liu
Zhongliang Liu
Junwei Shu
Changbo Wang
Yang Li

Transferring 2D textures onto complex 3D scenes plays a vital role in enhancing the efficiency and controllability of 3D multimedia content creation. However, existing 3D style transfer methods primarily focus on transferring abstract artistic styles to 3D scenes. These methods often overlook the geometric information of the scene, which makes it challenging to achieve high-quality 3D texture transfer results. In this paper, we present GT2-GS, a geometry-aware texture transfer framework for gaussian splatting. First, we propose a geometry-aware texture transfer loss that enables view-consistent texture transfer by leveraging prior view-dependent feature information and texture features augmented with additional geometric parameters. Moreover, an adaptive fine-grained control module is proposed to address the degradation of scene information caused by low-granularity texture features. Finally, a geometry preservation branch is introduced. This branch refines the geometric parameters using additionally bound Gaussian color priors, thereby decoupling the optimization objectives of appearance and geometry. Extensive experiments demonstrate the effectiveness and controllability of our method. Through geometric awareness, our approach achieves texture transfer results that better align with human visual perception.

PDF Details DOI

JBHI Journal 2026 Journal Article

HSGO: Harmonized Swarm Learning With Guided Optimization for Multi-Center sMRI Classification of Alzheimer's Disease

Fangtao Song
Yang Li
Mingfeng Jiang
Kaicheng Li
Jucheng Zhang
Yinlong Zhang
Zhibo Pang

Developing robust Alzheimer's Disease (AD) classification models necessitates extensive training data, but aggregating multi-center medical data poses privacy risks. Although Federated Learning (FL) and Swarm Learning (SL) allow training generic models without data sharing, their performance is limited by variations in AD pathology features and sample class imbalances across centers. To address this issue, we propose a novel Harmonized Swarm Learning framework with Guided Optimization (HSGO) to enhance multi-center collaboration while preserving data privacy. Our framework employs a class-balanced loss function to train a robust generic model and guides the optimization of personalized models towards the generic model, eliminating extra AD pathology feature extraction steps. Furthermore, we design a dynamic feature similarity storage mechanism to facilitate personalized training. Experiments performed under two different multi-center data partitioning scenarios demonstrate that HSGO achieves competitive performance when compared with five baseline methods. Additionally, Layer-wise Relevance Propagation (LRP) analysis indicates that HSGO may help identify potential key brain regions in AD by integrating local and global features compared to traditional SL.

Details DOI

AAAI Conference 2026 Conference Paper

HyperDiag: Temporal–Regional Hypergraph Learning via Topology-Enhanced State Propagation for Brain Disease Diagnosis

Yulan Ma
Fangkun Li
Wenchao Yang
Qian Si
Chenglong Yu
Yang Li

Dynamic brain networks provide a powerful representation for capturing temporal variations in functional brain connectivity and have gained increasing attention in brain disease diagnosis. However, most existing methods extract features from isolated time windows, making it difficult to capture the high-order dynamic evolution of brain activity. Moreover, these methods often neglect the functional heterogeneity among brain regions, thereby limiting diagnostic performance. To address these limitations, we propose HyperDiag, a novel temporal-regional Hypergraph learning via topology-enhanced state propagation for brain disease Diagnosis. Specifically, we first design a dual-level hypergraph learning strategy: a temporally-evolving hypergraph message passing strategy to capture dynamic high-order dependencies within and across time windows, and meanwhile, a region-wise functional hypergraph learning strategy to capture regional dependencies. Subsequently, we construct a topology-enhanced selective state-space propagation network to integrate complementary information from both the temporally-evolving and region-wise features. Extensive experiments on four brain disorder datasets (ABIDE-I, ADNI, REST-meta-MDD, and Epilepsy) demonstrate that HyperDiag not only outperforms state-of-the-art methods but also identifies biologically meaningful abnormal connections, offering potential biomarkers for clinical interpretation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Zhenyu Bi
Gaurav Srivastava
Yang Li
Swastik Roy
Meng Lu
Morteza Ziyadi
Xuan Wang

While small language models (SLMs) have shown promise on various reasoning tasks, their ability to judge the correctness of answers remains unclear compared to large language models (LLMs). Prior work on LLM-as-a-judge frameworks typically relies on comparing candidate answers against ground-truth labels or other candidate answers using predefined metrics like entailment. However, this approach is inherently indirect and difficult to fully automate, offering limited support for fine-grained and scalable evaluation of reasoning outputs. In this work, we propose JudgeBoard, a novel evaluation pipeline that directly queries models to assess the correctness of candidate answers without requiring extra answer comparisons. We focus on two core reasoning domains: mathematical reasoning and science/commonsense reasoning, and construct task-specific evaluation leaderboards using both accuracy-based ranking and an Elo-based rating system across five benchmark datasets, enabling consistent model comparison as judges rather than comparators. To improve judgment performance in lightweight models, we propose MAJ (Multi-Agent Judging), a novel multi-agent evaluation framework that leverages multiple interacting SLMs with distinct reasoning profiles to approximate LLM-level judgment accuracy through collaborative deliberation. Experimental results reveal a significant performance gap between SLMs and LLMs in isolated judging tasks. However, our MAJ framework substantially improves the reliability and consistency of SLMs. On the MATH dataset, MAJ using smaller-sized models as backbones performs comparatively well or even better than their larger-sized counterparts. Our findings highlight that multi-agent SLM systems can potentially match or exceed LLM performance in judgment tasks, with implications for scalable and efficient assessment.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer

Enming Zhang
Liwen Cao
Yanru Wu
Zhao Zijie
Yang Li

Prompt tuning has emerged as a lightweight strategy for adapting foundation models to downstream tasks, particularly for resource-constrained systems. As pre-trained prompts become valuable assets, combining multiple source prompts offers a promising approach to enhance generalization for new tasks by leveraging complementary knowledge. However, naive aggregation often overlooks different source prompts have different contribution potential to the target task. To address this, we propose HGPrompt, a dynamic framework that learns optimal ensemble weights. These weights are optimized by jointly maximizing an information-theoretic metric for transferability and minimizing gradient conflicts via a novel regularization strategy. Specifically, we propose a differentiable prompt transferability metric to captures the discriminability of prompt-induced features on the target task. Meanwhile, HGPrompt match the gradient variances with respect to different source prompts based on Hessian and Fisher Information, ensuring stable and coherent knowledge transfer while suppressing gradient conflicts among them. Extensive experiments on the large-scale VTAB benchmark demonstrate the state-of-the-art performance of HGPrompt, validating its effectiveness in learning an optimal ensemble for effective multi-source prompt transfer.

PDF Details DOI

EAAI Journal 2026 Journal Article

Numerical spiking neural membrane systems with dendritic spines for diagnosis of infectious spondylitis on Magnetic Resonance Images

Hongyan Zhang
Qiang Zhang
Jin Wang
Xiang Yu
Yang Li
Xiyu Liu
Jie Xue

Details DOI

JBHI Journal 2026 Journal Article

PAM-CDR: Property-Aware Multi-Modal Drug Representation Learning for Accurate Cancer Drug Response Prediction

Yang Li
Chang Liu
Haijie Cui
Jianli Ma

Accurate prediction of cancer drug response is essential for advancing precision oncology, enabling tailored therapies that account for the molecular heterogeneity of tumors. While deep learning has shown promise in this domain, many existing approaches fail to incorporate physicochemical properties of drug compounds, limiting the biological interpretability and generalizability of learned representations. To address this gap, we present PAM-CDR, a property-aware multi-modal representation learning framework that integrates molecular graphs, fingerprints, and physicochemical descriptors with transcriptomic and genomic profiles of cancer cell lines. PAM-CDR employs a three-stage hierarchical fusion strategy to enable fine-grained representation learning across drug and cell modalities. In the first stage, property-guided attention injects biologically meaningful context to enrich molecular graph and fingerprint features. In the second stage, bidirectional cross-modality interactions capture complementary patterns and enhance multi-omic cellular representations. In the final stage, unified drug and cell line embeddings are integrated to accurately predict drug responses. Benefiting from these designs, PAM-CDR consistently outperforms competitive baselines, achieving an AUC of 0. 9161 and an AUPR of 0. 9313. Ablation studies confirm the critical contribution of physicochemical priors, while embedding visualizations reveal improved biological coherence in the learned molecular representations. The code is publicly available at https://github.com/catly/PAM-CDR.

Details DOI

AAAI Conference 2026 Conference Paper

PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure

Ke Jia
Yuheng Ma
Yang Li
Feifei Wang

We revisit the problem of generating synthetic data under differential privacy. To address the core limitations of marginal-based methods, we propose the Private Adaptive Generative Adversarial Network with Bayes Network Structure (PrAda-GAN), which integrates the strengths of both GAN-based and marginal-based approaches. Our method adopts a sequential generator architecture to capture complex dependencies among variables, while adaptively regularizing the learned structure to promote sparsity in the underlying Bayes network. Theoretically, we establish diminishing bounds on the parameter distance, variable selection error, and Wasserstein distance. Our analysis shows that leveraging dependency sparsity leads to significant improvements in convergence rates. Empirically, experiments on both synthetic and real-world datasets demonstrate that PrAda-GAN outperforms existing tabular data synthesis methods in terms of the privacy–utility trade-off.

PDF Details DOI

EAAI Journal 2026 Journal Article

Progressive category-aware anti-distillation

Yao Zhang
Yang Li
Zhisong Pan

Details DOI

AAAI Conference 2026 Conference Paper

Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection

Zihao Zhang
Yang Li
Aming WU
Yahong Han

In this paper, we focus on Single-Domain Generalized Object Detection (Single-DGOD), aiming to transfer a detector trained on one source domain to multiple unknown domains. Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity, thereby mitigating the lack of access to target domain data. However, in real-world scenarios such as changes in weather or lighting conditions, domain shifts often occur continuously and gradually. Discrete augmentations and static perturbations fail to effectively capture the dynamic variation of feature distributions, thereby limiting the model's ability to perceive fine-grained cross-domain differences. To this end, we propose a new method, i.e., Liquid Temporal Feature Evolution, which simulates the progressive evolution of features from the source domain to simulated latent distributions by incorporating temporal modeling and liquid neural network–driven parameter adjustment. Specifically, we introduce controllable Gaussian noise injection and multi-scale Gaussian blurring to simulate initial feature perturbations, followed by temporal modeling and a liquid parameter adjustment mechanism to generate adaptive modulation parameters, enabling a smooth and continuous adaptation across domains. By capturing progressive cross-domain feature evolution and dynamically regulating adaptation paths, our method bridges the source-unknown domain distribution gap, significantly boosting generalization and robustness to unseen shifts. Significant performance improvements on the Diverse Weather dataset and Real-to-Art benchmark demonstrate the superiority of our method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability

Jiankang Wang
Zhihan Zhang
Zhihang Liu
Yang Li
Jiannan Ge
Hongtao Xie
Yongdong Zhang

Multimodal Large Language Models (MLLMs) have shown remarkable progress in temporal or spatial localization tasks, but struggle with joint spatio-temporal video grounding (STVG). We identify two key bottlenecks hindering this capability: (1) the sheer number of visual tokens makes long-range and fine-grained visual modeling challenging; (2) generating a long sequence of bounding boxes in text makes it hard to accurately align each box with its specific video frame. Distinct from prior efforts that rely on attaching complex modules, we argue for a more elegant paradigm that unlocks the inherent potential of MLLMs and leverages their strengths. To this end, we propose \textbf{\textit{SpaceVLLM}}, a MLLM equipped with spatio-temporal video grounding capabilities. Specifically, we propose Spatio-Temporal Aware Queries, interleaved with video frames, to guide the MLLM in capturing both static appearance and dynamic motion features. We further present a lightweight Query-Guided Space Head that maps queries to precise spatial coordinates, bypassing the need for direct textual coordinate generation and enabling the MLLM to focus on video understanding. To further facilitate research in this area, we propose an automated data synthesis pipeline to construct \textbf{V-STG} dataset, comprising 110K STVG instances. Extensive experiments show that \textit{SpaceVLLM} achieves the state-of-the-art performance on STVG benchmarks and maintains strong performance on various video understanding tasks, validating our approach's effectiveness.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Splats in Splats: Robust and Effective 3D Steganography Towards Gaussian Splatting

Yijia Guo
Wenkai Huang
Yang Li
Gaolei Li
Hang Zhang
Liwen Hu
Jianhua Li
Tiejun Huang

3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical deployment. Here we describe splats in splats, the first 3DGS steganography framework that embeds 3D content in 3DGS itself without modifying any attributes. To achieve this, we take a deep insight into spherical harmonics (SH) and devise an importance-graded SH coefficient encryption strategy to embed the hidden SH coefficients. Furthermore, we employ a convolutional autoencoder to establish a mapping between the original Gaussian primitives' opacity and the hidden Gaussian primitives' opacity. Extensive experiments indicate that our method significantly outperforms existing 3D steganography techniques, with 5.31% higher scene fidelity and 3x faster rendering speed, while ensuring security, robustness, and user experience.

PDF Details DOI

EAAI Journal 2026 Journal Article

Structure-based curriculum learning ultrasound gallbladder image classification network

Xintao Mu
Shengbiao Yang
Jing Zhuo
Yang Li
Jia Wang
Cheng Peng
Xin Wang

Details DOI

AAAI Conference 2026 Conference Paper

Target-Balanced Score Distillation

Zhou Xu
Qi Wang
Yuxiao Yang
Luyuan Zhang
Zhang Liang
Yang Li

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Jun Xu
Xinkai Du
Yu Ao
Peilong Zhao
Yang Li
Ling Zhong
Lin Yuan
Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

Haotian Jin
Yang Li
Haihui Fan
Lin Shen
Xiangfang Li
Bo Li

Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic or implicit triggers. This increased flexibility in trigger design makes it challenging for defenders to accurately identify their specific forms. Most existing backdoor defense methods are limited to specific types of triggers or rely on an additional clean model for support. To address this issue, we propose a backdoor detection method based on attention similarity, enabling backdoor detection without prior knowledge of the trigger. Our study reveals that models subjected to backdoor attacks exhibit unusually high similarity among attention heads when exposed to triggers. Based on this observation, we propose an attention safety alignment approach combined with head-wise fine-tuning to rectify potentially contaminated attention heads, thereby effectively mitigating the impact of backdoor attacks. Extensive experimental results demonstrate that our method significantly reduces the success rate of backdoor attacks while preserving the model’s performance on downstream tasks.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning

Qingyue Zhang
Haohao Fu
Guanbo Huang
Yaoyuan Liang
Chang Chu
Tianren Peng
Yanru Wu
Qi Li

Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure based on K-L divergence, and minimize it based on high-dimensional statistical analysis to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for target model training in multi-source transfer learning. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code is available at https: //github. com/zqy0126/OTQMS.