Author name cluster

Lei Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers

2 author rows

EAAI Journal 2026 Journal Article

A semi-supervised multimodal fusion framework with adversarial contrastive learning for Alzheimer's disease diagnosis

Haowen Liu
Lei Shi
Yucheng Shi
Yameng Zhang
Guohua Zhao
Yufei Gao

Multimodal neuroimaging data provide complementary structural and functional information for Alzheimer's disease (AD) diagnosis. However, acquiring a large-scale multimodal dataset with precise annotations is resource-intensive and incurs substantial costs. Effectively fusing relevant information across modalities also remains a significant challenge. Existing methods are often constrained by modeling intra-modality specific features and inter-modal shared information in isolation, overlooking their critical interactions. Additionally, the heterogeneity among modalities introduces a major obstacle to achieving reliable cross-modality feature alignment. To tackle these problems, we propose an end-to-end semi-supervised multimodal fusion framework with adversarial contrastive learning. Specifically, a pseudo-labeling strategy is designed to make full use of unlabeled data by regulating the quality of pseudo-labels with a threshold. To adaptively capture inter-modal shared characteristics while preserving the unique properties of intra-modality, we design a Dual-phase Inter–Intra Attention Fusion Unit that effectively exploits the interactions between different modalities. Moreover, to achieve efficient alignment of multimodal data at both the feature and subject levels, we develop a hierarchical alignment strategy based on adversarial contrastive learning. This strategy maps features into a shared latent space and promotes the proximity of inter-modal paired samples within that space, thereby simultaneously mitigating distributional discrepancies and resolving semantic inconsistencies across modalities. Extensive experiments conducted on two independent public datasets demonstrate that the proposed framework performs excellently in AD diagnosis compared with existing approaches, notably achieving an accuracy of 92. 00 (±1. 00)% on ADNI1 with only 40% labels.

Details DOI

AAAI Conference 2026 Conference Paper

Adaptive Graph Attention Based Discrete Hashing for Incomplete Cross-modal Retrieval

Shuang Zhang
Yue Wu
Lei Shi
Huilong Jin
Feifei Kou
Pengfei Zhang
Mingying Xu
Pengtao Lv

Cross-modal hashing has emerged as a pivotal solution for efficient retrieval across diverse modalities, such as images and texts, by mapping them into compact binary hash spaces. However, in real-world scenarios, the modalities data is often missing or misaligned. Existing methods are most rely on fully paired training data and ignore missing or misaligned modalities data, resulting in the semantic inconsistencies. To address these challenges, we propose an Adaptive Graph Attention-Based Discrete Hashing (AGADH) method, which consists of three parts. First, to solve the problem of missing modalities, AGADH employs a masked completion strategy to reconstruct missing modalities. Second, to mitigate semantic misalignment, AGADH leverages a Graph Attention Network (GAT) encoder-decoder architecture with alignment module to construct features from different modalities. Additionally, to enhance the fusion performance, an adaptive fusion module dynamically adjusting the contributions of image and text modalities with learnable weighting coefficients is proposed. Extensive experiments on three benchmark datasets, MS-COCO, NUS-WIDE, and MIRFlickr-25K, demonstrating that AGADH outperforms state-of-the-art methods in both fully paired and incompletely paired scenarios, showing its robustness and effectiveness in cross-modal retrieval tasks.

PDF Details DOI

AAAI Conference 2026 Short Paper

Constraint-Augmented Mongolian-Chinese Neural Machine Translation Based on Dynamic Feedback Alignment (Student Abstract)

Shuting Dai
Yatu Ji
Yanli Wang
Lei Shi
Qing-Dao-Er-Ji Ren
Nier Wu
Na Liu

The scarcity of parallel corpora for Mongolian and Chinese constrains the performance of Mongolian-Chinese neural machine translation (NMT), particularly manifesting in inadequate accuracy in translating specialized terminology. To address this limitation, this study adopts a lexically constrained augmentation strategy that constructs pseudo-source sentences by appending Chinese constraint words to Mongolian source texts, while enforcing the inclusion of these constraints in the output to improve translation accuracy. However, this approach presents two inherent drawbacks: processing pseudo-sentences with a single encoder tends to induce semantic interference, while the introduced constraint words may exacerbate alignment errors during decoding. To overcome these limitations, this paper propose a Constraint-Augmented Mongolian-Chinese NMT method (CANMT) based on dynamic feedback alignment. The method employs a dual-encoder architecture to isolate bilingual representations, coupled with a dynamic feedback alignment module that progressively reduces alignment errors through iterative reffnement, thereby enhancing overall translation performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MoE-Guided Graph Diffusion for Oriented Molecule Design

Shuochen Li
Xiangqi Guo
Huobin Tan
Lei Shi

Designing molecules with desired properties, aka the oRiented molEcule Design (RED), is a fundamental task in chemistry and materials science. While graph diffusion models (GDMs) and reinforcement learning techniques (RL) show promise in molecule structure generation and property optimization stages individually, their integration in the unified RED task often suffers from poor compatibility. The large variance among candidate molecular structures generated by GDMs can be amplified in the iterative optimization process of RL, leading to slow and unstable convergence. In this work, motivated by the adaptive and divide-and-conquer characteristics of Mixture of Experts (MoE) architecture, we propose a novel framework called MoE-Guided Graph Diffusion Model (MEGD) that incorporates the MoE architecture to guide the orchestration of GDM and RL, promoting faster and more stable convergence in the design process. MEGD is evaluated on benchmark datasets optimizing the physical and chemical properties of AI-generated molecular structures. On all three datasets, our method outperforms the best of 9 alternative models by 7.73% on the target structural properties, while not penalizing other important application-level quality metrics of the generated molecules. A real-world case study on an emerging class of material, i.e., metal-organic framework, is also conducted, which further demonstrates the effectiveness of our method in accomplishing the RED task.

PDF Details DOI

EAAI Journal 2026 Journal Article

Municipal solid waste gasification predictions using hybrid-data physics-informed neural networks

Vincentius Surya Kurnia Adi
Lei Shi
Wei Wu

Municipal solid waste (MSW) gasification presents a viable thermochemical pathway for waste valorization and sustainable energy generation. Complicated mathematical modeling, nonlinear behavior, and heterogeneous feedstocks cannot ensure a feasible prediction of their outlet compositions. This study presents a hybrid-data physics-informed neural network (PINN)—a physics-guided machine learning (ML) approach within artificial intelligence (AI)—to predict syngas composition and lower heating value (LHV). The framework integrates experimental measurements, Aspen Plus simulation, and thermodynamic monotonicity constraints with respect to temperature, moisture content, and equivalence ratio. Aspen Plus models were constructed using equilibrium chemistry and the Peng-Robinson equation of state (EOS) with Boston-Mathias modifications (PR-BM) property method, and monotonicity constraints concerning temperature, moisture content, and equivalence ratio are embedded into the PINN loss function to enforce physical consistency. To address the feasible and effective PINN framework, three model scenarios of a pure data-driven PINN with a monotonicity loss term, a hybrid data-driven PINN with a monotonicity loss term, and a hybrid data-driven PINN with three physics-informed loss terms are proposed. Through the benchmark comparisons, the hybrid data-driven PINN with three physics-informed loss terms outperforms the conventional machine learning (ML) models such as Support Vector Machines (SVM), Extreme Gradient Boosting (XGB), and other PINN scenarios due to highest coefficient of determination R2 values of carbon dioxide (CO) (0. 9600), LHV (0. 9535), and nitrogen (N2) (0. 9650). It shows that the proposed approach offers enhanced accuracy, generalizability, and interpretability—supporting data-driven decision-making in MSW gasification and sustainable energy system design.

Details DOI

AAAI Conference 2026 Conference Paper

MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation

Yuqiu Zhao
Lei Shi
Yan Zhong
Feifei Kou
Pengfei Zhang
Jiwei Zhang
Mingying Xu
Yanchao Liu

Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: (1) Learning decoupled multi-modal features limits the quality of the semantic representation. (2) Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a multi-modal semantic-enhanced identifier with collaborative signals for generative recommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.

PDF Details DOI

ICRA Conference 2025 Conference Paper

A Fairness-Oriented Control Framework for Safety-Critical Multi-Robot Systems: Alternative Authority Control

Lei Shi
Qichao Liu
Cheng Zhou
Xiong Li 0001

This paper proposes a fair control framework for multi-robot systems, which integrates the newly introduced Alternative Authority Control (AAC) and Flexible Control Barrier Function (F-CBF). Control authority refers to a single robot which can plan its trajectory while considering others as moving obstacles, meaning the other robots do not have authority to plan their own paths. The AAC method dynamically distributes the control authority, enabling fair and coordinated movement across the system. This approach significantly improves computational efficiency, scalability, and robustness in complex environments. The proposed F-CBF extends traditional CBFs by incorporating obstacle shape, velocity, and orientation. FCBF enhances safety by accurate dynamic obstacle avoidance. The framework is validated through simulations in multi-robot scenarios, demonstrating its safety, robustness and computational efficiency.

Details

NeurIPS Conference 2025 Conference Paper

Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

Shuang Zhang
Yue Wu
Lei Shi
Yingxue Zhang
Feifei Kou
Huilong Jin
Pengfei Zhang
Meiyu Liang

The demand for multimodal data processing drives the development of information technology. Cross-modal hash retrieval has attracted much attention because it can overcome modal differences and achieve efficient retrieval, and has shown great application potential in many practical scenarios. Existing cross-modal hashing methods have difficulties in fully capturing the semantic information of different modal data, which leads to a significant semantic gap between modalities. Moreover, these methods often ignore the importance differences of channels, and due to the limitation of a single goal, the matching effect between hash codes is also affected to a certain extent, thus facing many challenges. To address these issues, we propose a Dynamic Masking and Auxiliary Hash Learning (AHLR) method for enhanced cross-modal retrieval. By jointly leveraging the dynamic masking and auxiliary hash learning mechanisms, our approach effectively resolves the problems of channel information imbalance and insufficient key information capture, thereby significantly improving the retrieval accuracy. Specifically, we introduce a dynamic masking mechanism that automatically screens and weights the key information in images and texts during the training process, enhancing the accuracy of feature matching. We further construct an auxiliary hash layer to adaptively balance the weights of features across each channel, compensating for the deficiencies of traditional methods in key information capture and channel processing. In addition, we design a contrastive loss function to optimize the generation of hash codes and enhance their discriminative power, further improving the performance of cross-modal retrieval. Comprehensive experimental results on NUS-WIDE, MIRFlickr-25K and MS-COCO benchmark datasets show that the proposed AHLR algorithm outperforms several existing algorithms.