Author name cluster

Fan Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

112 papers

2 author rows

EAAI Journal 2026 Journal Article

A self-adaptive transformer-enhanced physics-informed neural network for railway dynamics system

Chengjia Han
Shuai Qu
Yun Yang
Maggie Y. Gao
Liwei Dong
Fan Yang
Tao Ma
Yaowen Yang

Details DOI

EAAI Journal 2026 Journal Article

An explainable artificial intelligence-Based approach for intelligent prediction and decision mechanism analysis of tunnel boring machine excavation performance

Jiajun Liang
Kangping Gao
Jingjing Feng
Fan Yang

Details DOI

AAAI Conference 2026 Conference Paper

Beyond Euclidean Assumptions: Geometry-Aware Adaptive Routing for Remote Sensing Segmentation

Jie Qiu
Dizuo Cao
Linwei Dai
Xin Li
Fan Yang
Dong Yu
Changying Wang
Zongheng Wen

Remote sensing imagery poses a distinct challenge for semantic segmentation due to its inherent fractal complexity and the diversity of geometric structures present in real-world geospatial scenes. Euclidean-based models typically assume spatial uniformity; however, such assumptions often break down when confronted with objects exhibiting markedly different structural characteristics—such as roads versus vegetation—thereby complicating the feature representation process. Hyperbolic space offers a theoretically grounded alternative for modeling such hierarchical and heterogeneous patterns, yet fully replacing Euclidean geometry incurs significant computational overhead. We therefore introduce Geometry-Aware Adaptive Routing (GAAR), a novel module that facilitates geometry-aware routing by dynamically allocating high-level features to either Euclidean or Hyperbolic subspaces through a learnable binary gating mechanism, informed by structural priors learned during training. To further promote routing stability and geometric consistency, we introduce Geometry-Aware Deterministic Regularization (GADR), a regularization strategy that encourages confident, structure-aligned assignments. GAAR is plug-and-play and integrates seamlessly into existing segmentation architectures. Experiments on three challenging Remote Sensing Image Semantic Segmentation (RSISS) benchmarks demonstrate that our approach consistently outperforms state-of-the-art (SOTA) methods, particularly in geometrically complex regions, offering a scalable and effective solution to the limitations of purely Euclidean modeling.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization

Binyan Xu
Fan Yang
Di Tang
Xilin Dai
Kehuan Zhang

Clean-image backdoor attacks, which use only label manipulation in training datasets to compromise deep neural networks, pose a significant threat to security-critical applications. A critical flaw in existing methods is that the poison rate required for a successful attack induces a proportional, and thus noticeable, drop in Clean Accuracy (CA), undermining their stealthiness. This paper presents a new paradigm for clean-image attacks that minimizes this accuracy degradation by optimizing the trigger itself. We introduce Generative Clean-Image Backdoors (GCB), a framework that uses a conditional InfoGAN to identify naturally occurring image features that can serve as potent and stealthy triggers. By ensuring these triggers are easily separable from benign task-related features, GCB enables a victim model to learn the backdoor from an extremely small set of poisoned examples, resulting in a CA drop of less than 1%. Our experiments demonstrate GCB's remarkable versatility, successfully adapting to six datasets, five architectures, and four tasks, including the first demonstration of clean-image backdoors in regression and segmentation. GCB also exhibits resilience against most of the existing backdoor defenses.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Catastrophic Forgetting in Kolmogorov-Arnold Networks

Mohammad Marufur Rahman
Guanchu Wang
Kaixiong Zhou
Minghan Chen
Fan Yang

Catastrophic forgetting is a longstanding challenge in continual learning, where models lose knowledge from earlier tasks when learning new ones. While various mitigation strategies have been proposed for Multi-Layer Perceptrons (MLPs), recent architectural advances like Kolmogorov-Arnold Networks (KANs) have been suggested to offer intrinsic resistance to forgetting by leveraging localized spline-based activations. However, the practical behavior of KANs under continual learning remains unclear, and their limitations are not well understood. To address this, we present a comprehensive study of catastrophic forgetting in KANs and develop a theoretical framework that links forgetting to activation support overlap and intrinsic data dimension. We validate these analyses through systematic experiments on synthetic and vision tasks, measuring forgetting dynamics under varying model configurations and data complexity. Further, we introduce KAN-LoRA, a novel adapter design for parameter-efficient continual fine-tuning of language models, and evaluate its effectiveness in knowledge editing tasks. Our findings reveal that while KANs exhibit promising retention in low-dimensional algorithmic settings, they remain vulnerable to forgetting in high-dimensional domains such as image classification and language modeling. These results advance the understanding of KANs’ strengths and limitations, offering practical insights for continual learning system design.

PDF Details DOI

EAAI Journal 2026 Journal Article

Clothing semantic-driven progressive guidance learning for clothing-change person re-identification

Shaona Wang
Mengyuan Song
Jia Shi
Shanhao Shi
Fan Yang
Lulu Yang

Details DOI

EAAI Journal 2026 Journal Article

Contrast-enhanced heterogeneous multi-view graph for session-based recommendation via subsequence units

Fan Yang
Li Ji
Shuo Zhang
Dunlu Peng
Yiming Xu
Nan Chen

Details DOI

EAAI Journal 2026 Journal Article

Dynamic multi-prototype guided domain incremental learning for Electroencephalogram-based disease classification

Fan Yang
Anping Zeng
Chunlin He
Yi Liu
Chaorong Li
Xingjie Wang

Details DOI

EAAI Journal 2026 Journal Article

Efficient deep learning-based prediction of floor response spectra for nuclear power plants using a multi-head attention-based convolutional bidirectional long short-term memory network

Fan Yang
Zhi Zheng
Xiaolan Pan
Zhongyao Lin
Pengkun Zhang

Details DOI

AAAI Conference 2026 Conference Paper

FilmSceneDesigner: Chaining Set Design for Procedural Film Scene Generation

Zhifeng Xie
Keyi Zhang
Yiye Yan
Yuling Guo
Fan Yang
Jiting Zhou
Mengtian Li

Film set design plays a pivotal role in cinematic storytelling and shaping the visual atmosphere. However, the traditional process depends on expert-driven manual modeling, which is labor-intensive and time-consuming. To address this issue, we introduce FilmSceneDesigner, an automated scene generation system that emulates professional film set design workflow. Given a natural language description, including scene type, historical period, and style, we design an agent-based chaining framework to generate structured parameters aligned with film set design workflow, guided by prompt strategies that ensure parameter accuracy and coherence. On the other hand, we propose a procedural generation pipeline which executes a series of dedicated functions with the structured parameters for floorplan and structure generation, material assignment, door and window placement, and object retrieval and layout, ultimately constructing a complete film scene from scratch. Moreover, to enhance cinematic realism and asset diversity, we construct SetDepot-Pro, a curated dataset of 6,862 film-specific 3D assets and 733 materials. Experimental results and human evaluations demonstrate that our system produces structurally sound scenes with strong cinematic fidelity, supporting downstream tasks such as virtual previs, construction drawing and mood board creation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

GeM-VG: Towards Generalized Multi-image Visual Grounding with Multimodal Large Language Models

Shurong Zheng
Yousong Zhu
Hongyin Zhao
Fan Yang
Yufei Zhan
Ming Tang
Jinqiao Wang

Multimodal Large Language Models (MLLMs) have demonstrated impressive progress in single-image grounding and general multi-image understanding. Recently, some methods begin to address multi-image grounding. However, they are constrained by single-target localization and limited types of practical tasks, due to the lack of unified modeling for generalized grounding tasks. Therefore, we propose GeM-VG, an MLLM capable of Generalized Multi-image Visual Grounding. To support this, we systematically categorize and organize existing multi-image grounding tasks according to cognitive demands and introduce the MG-Data-240K dataset, addressing the limitations of existing datasets regarding target quantity and image relation. To tackle the challenges of robustly handling diverse multi-image grounding tasks, we further propose a hybrid reinforcement finetuning strategy that integrates chain-of-thought (CoT) reasoning and direct answering, considering their complementary strengths. This strategy adopts an R1-like algorithm guided by a carefully designed rule-based reward, effectively enhancing the model’s overall perception and reasoning capabilities. Extensive experiments demonstrate the superior generalized grounding capabilities of our model. For multi-image grounding, it outperforms the previous leading MLLMs by 2.0% and 9.7% on MIG-Bench and MC-Bench, respectively. In single-image grounding, it achieves a 9.1% improvement over the base model on ODINW. Furthermore, our model retains strong capabilities in general multi-image understanding.

PDF Details DOI

AAAI Conference 2026 System Paper

KnowThyself: An Agentic Assistant for LLM Interpretability

Suraj Prasai
Mengnan Du
Ying Zhang
Fan Yang

We develop KnowThyself, an agentic assistant that advances large language model (LLM) interpretability. Existing tools provide useful insights but remain fragmented and code-intensive. KnowThyself consolidates these capabilities into a chat-based interface, where users can upload models, pose natural language questions, and obtain interactive visualizations with guided explanations. At its core, an orchestrator LLM first reformulates user queries, an agent router further directs them to specialized modules, and the outputs are finally contextualized into coherent explanations. This design lowers technical barriers and provides an extensible platform for LLM inspection. By embedding the whole process into a conversational workflow, KnowThyself offers a robust foundation for accessible LLM interpretability.

PDF Details DOI

JBHI Journal 2026 Journal Article

MoACNN-XGNet: Interpretable Multi-Omics Convolutional Network for Breast Cancer Subtyping and Prognostic Genes Identification

Qian Li
Lei Liu
Qing Zhang
Xiaobin Zhang
Na Li
Yaoyao Zhao
Jiayi Teng
Fuzhong Xue

Breast cancer, a highly heterogeneous disease at both the phenotypic and molecular levels, presents significant challenges for prognosis and treatment. Accurate subtyping of breast cancer is critical due to its complex biological characteristics, which directly influence disease progression and therapeutic outcomes. In this study, we integrate multi-omics data, including copy number variation, RNA sequencing, and DNA methylation, to generate two-dimensional representations of each sample using Uniform Manifold Approximation and Projection. This transformation enhances data interpretability and supports subsequent learning tasks. Traditional convolutional neural networks have demonstrated potential in medical image analysis but often struggle with high-dimensional omics data. To address this limitation, we propose MoACNN-XGNet, an attention-based convolutional neural network framework that prioritizes key features within image-transformed multi-omics data. Our method significantly improves the precision of subtype classification and effectively overcomes the challenges posed by the high dimensionality and structural complexity of multi-omics data. Furthermore, we employ the Guided Grad-CAM method to enhance model interpretability, enabling the identification of subtype-specific explainable genes. Subsequent enrichment and survival analyses of these genes reveal critical biological pathways and potential therapeutic targets. This study offers a novel approach to refining breast cancer subtyping and highlights the potential for personalized treatment strategies, ultimately aiming to improve patient survival outcomes.

Details DOI

AAAI Conference 2026 Conference Paper

TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs

Yunxiao Wang
Meng Liu
Wenqi Liu
Xuemeng Song
Bin Wen
Fan Yang
Tingting Gao
Di Zhang

Video large language models have achieved remarkable performance in tasks such as video question answering, however, their temporal understanding remains suboptimal. To address this limitation, we curate a dedicated instruction fine-tuning dataset that focuses on enhancing temporal comprehension across five key dimensions. In order to reduce reliance on costly temporal annotations, we introduce a multi-task prompt fine-tuning approach that seamlessly integrates temporal-sensitive tasks into existing instruction datasets without requiring additional annotations. Furthermore, we develop a novel benchmark for temporal-sensitive video understanding that not only fills the gaps in dimension coverage left by existing benchmarks but also rigorously filters out potential shortcuts, ensuring a more accurate evaluation. Extensive experimental results demonstrate that our approach significantly enhances the temporal understanding of video-LLMs while avoiding reliance on shortcuts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UDCH: Unsupervised Dynamic Weighted Cluster-cooperative Hashing for Cross-modal Retreival

Yuanzhi Zhao
Fan Yang
Yudong Zhao
Xiaoyu Li

In cross-modal retrieval tasks, unsupervised hash code learning still faces key challenges, including the difficulty of modeling shared semantic structures across modalities and the inability to adaptively balance multiple supervision objectives during optimization. To address these issues, we propose a novel Unsupervised Dynamic Weighted Cluster-Cooperative Hashing (UDCH) framework, which jointly models feature-level alignment and cluster-level semantic structure to guide consistency learning across modalities under label-free conditions. Specifically, we design an instance-level contrastive loss in the feature branch to align the embedding spaces of images and texts, while employing K-Means clustering to generate pseudo-labels and construct a cluster-center contrast mechanism that captures semantic grouping information. Furthermore, we integrate cross-modal feature similarity to construct a high-order structure matrix, enabling fine-grained structural supervision. To enhance the synergy of multi-objective optimization, we introduce a dynamic weighting strategy that adaptively adjusts the contributions of the feature and cluster branches based on the degree of modal alignment and semantic compactness. Extensive experiments on multiple cross-modal retrieval benchmarks demonstrate that UDCH achieves superior semantic alignment and retrieval performance under unsupervised settings, validating the effectiveness of multi-level semantic modeling and adaptive collaboration mechanisms in unsupervised hashing tasks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

3DHumanEdit: Multi-modal Body Part-aware Conditioning Information Integration for 3D Human Manipulation

FeiFan Xu
Tianyi Chen
Fan Yang
Yunfei Zhang
Si Wu

The rapid advancement of 3D Generative Adversarial Networks (GANs) has significantly enhanced the diversity and quality of generated 3D images. Despite these breakthroughs, the manipulation capabilities of 3D GANs remain unexplored, presenting substantial challenges for practical applications where user interaction and modification are essential. Current manipulation methods often lack the precision needed for fine-grained attribute manipulation, and struggle to maintain multi-view consistency during the editing process. To address these limitations, we propose 3DHumanEdit, a novel approach for 3D human body part-aware manipulation. 3DHumanEdit leverages multi-modal feature fusion and body part-aware feature alignment to achieve precise manipulation of individual body parts based on detailed text inputs and segmentation images. By exploring 3D prior for accurate editing and enforcing correspondence in latent space, 3DHumanEdit ensures coherence across multiple views. Experiments demonstrate that 3DHumanEdit outperforms existing methods in both editing fidelity and multi-view consistency, offering a robust solution for fine-grained 3D manipulation.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Contrasting Adversarial Perturbations: The Space of Harmless Perturbations

Lu Chen
Shaofeng Li
Benhao Huang
Fan Yang
Zheng Li
Jie Li
Yuan Luo

Existing works have extensively studied adversarial examples, which are minimal perturbations that can mislead the output of deep neural networks (DNNs) while remaining imperceptible to humans. However, in this work, we reveal the existence of a harmless perturbation space, in which perturbations drawn from this space, regardless of their magnitudes, leave the network output unchanged when applied to inputs. Essentially, the harmless perturbation space emerges from the usage of non-injective functions (linear or non-linear layers) within DNNs, enabling multiple distinct inputs to be mapped to the same output. For linear layers with input dimensions exceeding output dimensions, any linear combination of the orthogonal bases of the nullspace of the parameter consistently yields no change in their output. For non-linear layers, the harmless perturbation space may expand, depending on the properties of the layers and input samples. Inspired by this property of DNNs, we solve for a family of general perturbation spaces that are redundant for the DNN's decision, and can be used to hide sensitive data and serve as a means of model identification. Our work highlights the distinctive robustness of DNNs (i.e., consistency under large magnitude perturbations) in contrast to adversarial examples (vulnerability for small noises).

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Wei Chen
Xin Yan
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Long Chen

Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious 'hallucination' issue: generating outputs misaligned with obvious visual or factual evidence. Currently, training-based solutions, like direct preference optimization (DPO), leverage paired preference data to suppress hallucinations. However, they risk sacrificing general reasoning capabilities due to the likelihood displacement. Meanwhile, training-free solutions, like contrastive decoding, achieve this goal by subtracting the estimated hallucination pattern from a distorted input. Yet, these handcrafted perturbations (e. g. , add noise to images) may poorly capture authentic hallucination patterns. To avoid these weaknesses of existing methods, and realize ``robust'' hallucination mitigation (\ie, maintaining general reasoning performance), we propose a novel framework: Decoupling Contrastive Decoding (DCD). Specifically, DCD decouples the learning of positive and negative samples in preference datasets, and trains separate positive and negative image projections within the MLLM. The negative projection implicitly models real hallucination patterns, which enables vision-aware negative images in the contrastive decoding inference stage. Our DCD alleviates likelihood displacement by avoiding pairwise optimization and generalizes robustly without handcrafted degradation. Extensive ablations across hallucination benchmarks and general reasoning tasks demonstrate the effectiveness of DCD, \ie, it matches DPO’s hallucination suppression while preserving general capabilities and outperforms the handcrafted contrastive decoding methods.