Author name cluster

Cheng Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

55 papers

2 author rows

EAAI Journal 2026 Journal Article

A trinity-branch parallel fusion and supervised enhancement network: A multimodal celiac disease diagnosis network based on transformer and dual-tower supervision

Jiahe Li
Tian Shi
Chen Chen
Xuguang Zhou
Wei Liu
Xiaoyi Lv
Feng Gao
Cheng Chen

Celiac disease (CD) is a complex autoimmune disorder where accurate diagnosis is crucial for improving patients' quality of life. Spectroscopic analysis, with its high sensitivity and non-invasive nature, can reveal subtle molecular-level changes in samples, providing an objective and reliable basis for diagnosing complex diseases like CD. Integrating multi-source omics data from techniques such as Raman spectroscopy, infrared spectroscopy, and metabolomics promises a comprehensive view for disease diagnosis. The key challenge, however, lies in how to effectively fuse this multi-modal data to fully leverage their complementary information. To address this challenge, we introduce a novel multi-modal deep fusion framework called Trinity Branch Parallel Fusion and Supervised Enhancement Net (TFS-Net). This framework employs an efficient multi-stage architecture to systematically process and fuse three types of omics data. First, dedicated modality-specific feature encoders, such as multi-scale dynamic convolutions for spectroscopic data and a self-attention-enhanced MLP for metabolomics data, are used for efficient intra-modal feature encoding. Next, a cross-modal attention mechanism deeply explores the pairwise interaction relationships between modalities. Building on this, our study innovatively introduces a dual-tower similarity supervision auxiliary task to enhance the consistency of feature representations across different modalities. Finally, a Transformer encoder performs global contextual modeling on all features to output the final diagnostic prediction. On a celiac disease dataset, the TFS-Net model demonstrates superior diagnostic performance. It achieves a 95. 82% accuracy through five-fold cross-validation, significantly outperforming existing single-modal baseline models and state-of-the-art multi-modal fusion methods. Furthermore, systematic ablation studies validate the necessity and effectiveness of our proposed multi-modal strategy and key model components, including dynamic convolution, cross-modal attention, and dual-tower supervision.

Details DOI

AAAI Conference 2026 Conference Paper

Beyond Adapter Retrieval: Latent Geometry-Preserving Composition via Sparse Task Projection

Pengfei Jin
Peng Shu
Sifan Song
Sekeun Kim
Qing Xiao
Cheng Chen
Tianming Liu
Xiang Li

Recent advances in parameter-efficient transfer learning have demonstrated the utility of composing LoRA adapters from libraries of pretrained modules. However, most existing approaches rely on simple retrieval heuristics or uniform averaging, which overlook the latent structure of task relationships in representation space. We propose a new framework for adapter reuse that moves beyond retrieval, formulating adapter composition as a geometry-aware sparse reconstruction problem. Specifically, we represent each task by a latent prototype vector derived from the base model’s encoder and aim to approximate the target task prototype as a sparse linear combination of retrieved reference prototypes, under an L1-regularized optimization objective. The resulting combination weights are then used to blend the corresponding LoRA adapters, yielding a composite adapter tailored to the target task. This formulation not only preserves the local geometric structure of the task representation manifold, but also promotes interpretability and efficient reuse by selecting a minimal set of relevant adapters. We demonstrate the effectiveness of our approach across multiple domains—including medical image segmentation, medical report generation and image synthesis. Our results highlight the benefit of coupling retrieval with latent geometry-aware optimization for improved zero-shot generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

Yuan Li
Jun Hu
Bryan Hooi
Bingsheng He
Cheng Chen

Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features—often rich in textual data—and graph structural information. Recently, Graph-Enhanced LLMs have emerged as a promising graph learning approach that converts graph information into prompts, exploiting LLMs' ability to reason over both textual and structural information. Among them, text-only prompting, which converts graph information into prompts consisting solely of text tokens, offers a solution that relies only on LLM tuning without requiring additional graph-specific encoders. However, text-only prompting struggles on heterogeneous fraud-detection graphs: multi-hop relations expand exponentially with each additional hop, leading to rapidly growing neighborhoods associated with dense textual information. These neighborhoods may overwhelm the model with long, irrelevant content in the prompt and suppress key signals from the target node, thereby degrading performance. To address this challenge, we propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node while summarizing neighbor information into coarse-grained text prompts. DGP introduces tailored summarization strategies for different data modalities—bi-level semantic abstraction for textual fields and statistical aggregation for numerical features—enabling effective compression of verbose neighbor content into concise, informative prompts. Experiments across public and industry datasets demonstrate that DGP operates within a manageable token budget while improving fraud detection performance by up to 6.8% (AUPRC) over state-of-the-art methods, showing the potential of Graph-Enhanced LLMs for fraud detection.

PDF Details DOI

EAAI Journal 2026 Journal Article

Research on aero-engine health monitoring method based on interpretable deep neural networks

Cheng Chen
Qiangang Zheng
Siyuan Hu
Haibo Zhang

Reliable health monitoring of aero-engines hinges on modeling accuracy, given its direct implications for flight safety. However, traditional baseline models (TBMs) often suffer from inaccuracies in similarity conversion coefficients, making it difficult to distinguish between modeling and degradation-related errors. To address this issue, a novel health monitoring method grounded in Interpretable Deep Neural Networks (IDNN) is proposed. This method adjusts the conventional similarity conversion coefficient to mitigate modeling errors. Leveraging the Smoothed Distribution-aware Integrated Gradients (SDAIG) method, the approach quantitatively distinguishes physically insignificant features and neurons by comparing their attributions to noise, while the distribution-aware analysis further reveals trends and variations in feature importance across different operating regimes. This provides an effective foundation for network pruning and interpreting physical relationships in the aero-engine system. Validation on real flight data using ablation experiments demonstrates that integrating the interpretability module achieves a 24% reduction in prediction error. Further case studies illustrate the method's potential to support fault diagnosis.

Details DOI

AAAI Conference 2026 Conference Paper

Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience

Zicheng Hu
Yuchen Wang
Cheng Chen

Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been extensively studied in previous work, most existing methods remain susceptible to various adversarial attacks. In this paper, we first study DeCMA2B with adversarial corruption, where an adversary can corrupt reward observations of all agents with a limited corruption budget. We propose a robust algorithm, called DeMABAR, which ensures that each agent’s individual regret suffers only an additive term proportional to the corruption budget. Then we consider a more realistic scenario where the adversary can only attack a small number of agents. Our theoretical analysis shows that the DeMABAR algorithm can also almost completely eliminate the influence of adversarial attacks and is inherently robust in the Byzantine setting, where an unknown fraction of the agents can be Byzantine, i.e., may arbitrarily select arms and communicate wrong information. We also conduct numerical experiments to illustrate the robustness and effectiveness of the proposed method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Tackling Dual-stage Missing Modalities in Brain Tumor Segmentation via Robust Modality Reconstruction and Prompt-guided Modality Adaptation

Yunpeng Zhao
Cheng Chen
Qing You Pang
Yibing Fu
Quanzheng Li
Carol Tang
Beng Ti Ang
Yueming Jin

Addressing missing modalities is a critical challenge in multimodal brain tumor segmentation. Most existing approaches merely handle modality-incomplete inputs during inference, assuming a full set of modalities for all training samples. However, this unrealistic assumption limits the usage of abundant modality-incomplete data commonly observed in clinical practice. In this paper, we explore a more practical task of tackling missing modalities during both training and inference. We propose a universal model featuring robust modality reconstruction and prompt-guided modality adaptation. Our mask-reconstruction pre-training enables robust modality-invariant representation learning, during which we design a novel distribution approximation method that supervises the reconstruction of absent modalities without requiring full-modal training data. Afterwards, when adapting our model to the segmentation task, we introduce the complete-then-distill (CTD) paradigm, which first estimates missing modalities in training samples from the available ones, and then distills the knowledge from the reconstructed full-modal representations to enhance learning from modality-incomplete data. Moreover, we propose prompt-guided modality adaptation to personalize a subset of model parameters during CTD, enabling the model to adapt to each distinct modality input scenario by using prompts with rich visual-textual information. Extensive experiments on two brain tumor segmentation benchmarks show our method consistently surpasses previous state-of-the-art approaches under dual-stage missing modality settings across various missing ratios.

PDF Details DOI

EAAI Journal 2026 Journal Article

The research on the diagnostic technology for aortic dissection and acute myocardial infarction based on Raman and infrared spectroscopy combined with multimodal deep learning

Lei Yan
Guangyao Ma
Cheng Chen
Chen Chen
Jing Tao
Xuguang Zhou
Ting Tian
Hao Liu

Background Aortic dissection and myocardial infarction are two common and life-threatening cardiovascular emergencies characterized by sudden onset, high mortality, and overlapping clinical symptoms such as chest pain and respiratory distress, which make accurate and timely clinical differentiation particularly challenging. Current mainstream diagnostic techniques, including computed tomography and transesophageal echocardiography, provide valuable anatomical and functional information but are often costly, time-consuming, and insensitive to early-stage biochemical alterations, which may result in missed or incorrect diagnoses in emergency settings. Aortic dissection often requires immediate repair of the damaged vessel to prevent further expansion or rupture, whereas myocardial infarction requires rapid restoration of blood flow to the myocardium. The treatment approaches for the two conditions are distinct, and misdiagnosis can result in severe consequences. Therefore, more convenient, rapid, and efficient diagnostic methods are urgently needed. Methods Vibrational spectroscopy is a noninvasive analytical technique with high sensitivity to molecular and biochemical changes in biological samples, and Raman spectroscopy and infrared spectroscopy target distinct molecular vibrational modes, providing complementary pathological information. In this study, a multimodal attention fusion network was developed to integrate Raman spectroscopy and infrared spectroscopy data for rapid disease classification. Results Experimental results demonstrated that the proposed method achieved a diagnostic accuracy of 94. 06 % and a specificity of 97. 03 % percent in distinguishing aortic dissection, myocardial infarction, and non-critical cases. Conclusion This method provides an innovative and efficient decision-support tool for the clinical differentiation of aortic dissection and myocardial infarction, offering significant clinical value.

Details DOI

AAAI Conference 2026 Conference Paper

Unleashing the Power of Image-Tabular Self-Supervised Learning via Breaking Cross-Tabular Barriers

Yibing Fu
Yunpeng Zhao
Zhitao Zeng
Cheng Chen
Yueming Jin

Multi-modal learning integrating medical images and tabular data has significantly advanced clinical decision-making in recent years. Self-Supervised Learning (SSL) has emerged as a powerful paradigm for pretraining these models on large-scale unlabeled image-tabular data, aiming to learn discriminative representations. However, existing SSL methods for image-tabular representation learning are often confined to specific data cohorts, mainly due to their rigid tabular modeling mechanisms when modeling heterogeneous tabular data. This inter-tabular barrier hinders the multi-modal SSL methods from effectively learning transferrable medical knowledge shared across diverse cohorts. In this paper, we propose a novel SSL framework, namely CITab, designed to learn powerful multi-modal feature representations in a cross-tabular manner. We design the tabular modeling mechanism from a semantic-awareness perspective by integrating column headers as semantic cues, which facilitates transferrable knowledge learning and the scalability in utilizing multiple data sources for pretraining. Additionally, we propose a prototype-guided mixture-of-linear layer (P-MoLin) module for tabular feature specialization, empowering the model to effectively handle the heterogeneity of tabular data and explore the underlying medical concepts. We conduct comprehensive evaluations on Alzheimer's disease diagnosis task across three publicly available data cohorts containing 4,461 subjects. Experimental results demonstrate that CITab outperforms state-of-the-art approaches, paving the way for effective and scalable cross-tabular multi-modal learning.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction

Cheng Chen
Hao Huang
Saurabh Bagchi

Collaborative perception enables connected vehicles to share information, overcoming occlusions and extending the limited sensing range inherent in single-agent (non-collaborative) systems. Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels, which incur high communication costs, or 2D planar features, which require accurate depth estimation or additional supervision, limiting their applicability to collaborative scenarios. To address these challenges, we propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction. By sharing and fusing intermediate Gaussian primitives, our method provides three benefits: a neighborhood-based cross-agent fusion that removes duplicates and suppresses noisy or inconsistent Gaussians; a joint encoding of geometry and semantics in each primitive, which reduces reliance on depth supervision and allows simple rigid alignment; and sparse, object-centric messages that preserve structural information while reducing communication volume. Extensive experiments demonstrate that our approach outperforms single-agent perception and baseline collaborative methods by +8.42 and +3.28 points in mIoU, and +5.11 and +22.41 points in IoU, respectively. When further reducing the number of transmitted Gaussians, our method still achieves a +1.9 improvement in mIoU, using only 34.6% communication volume, highlighting robust performance under limited communication budgets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

Zicheng Hu
Cheng Chen

We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of $O(KC)$, which does not match the lower bound of $\Omega(C)$, where $K$ denotes the number of arms and $C$ denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.