Arrow Research search

Author name cluster

Lei Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

Equivariant Atomic and Lattice Modeling Using Geometric Deep Learning for Crystal Structure Optimization

  • Ziduo Yang
  • Yi-Ming Zhao
  • Xian Wang
  • Wei Zhuo
  • Xiaoqing Liu
  • Lei Shen

Structure optimization, which yields the relaxed structure (minimum‑energy state), is essential for reliable materials property calculations, yet traditional ab initio approaches such as density‑functional theory (DFT) are computationally intensive. Machine learning (ML) has emerged to alleviate this bottleneck but suffers from two major limitations: (i) existing models operate mainly on atoms, leaving lattice vectors implicit despite their critical role in structural optimization; and (ii) they often rely on multi-stage, non-end-to-end workflows that are prone to error accumulation. Here, we present E³Relax, an end-to-end equivariant graph neural network that maps an unrelaxed crystal directly to its relaxed structure. E³Relax promotes both atoms and lattice vectors to graph nodes endowed with dual scalar–vector features, enabling unified and symmetry‑preserving modeling of atomic displacements and lattice deformations. A layer‑wise supervision strategy forces every network depth to make a physically meaningful refinement, mimicking the incremental convergence of DFT while preserving a fully end‑to‑end pipeline. We evaluate E³Relax on four benchmark datasets and demonstrate that it achieves remarkable accuracy and efficiency. Through DFT validations, we show that the structures predicted by E³Relax are energetically favorable, making them suitable as high-quality initial configurations to accelerate DFT calculations.

AAAI Conference 2026 Conference Paper

Temporal Calibrating and Distilling for Scene-Text Aware Text-Video Retrieval

  • Zhiqian Zhao
  • Liang Li
  • Lei Shen
  • Xichun Sheng
  • Yaoqi Sun
  • Fang Kang
  • Chenggang Yan

Existing text-video retrieval methods mainly focus on singlemodal video content (i.e., visual entities), often overlooking heterogeneous scene text that is ubiquitous in human environments. Although scene text in videos provides finegrained semantics for cross-modal retrieval, effectively utilizing it presents two key challenges: (1) Temporally dense scene text disrupts sync with sparse video frames, obstructing video understanding;(2) Redundant scene text and irrelevant video frames hinder the learning of discriminative temporal clues for retrieval. To address them, we propose a temporal scene-text calibrating and distilling (TCD) network for textvideo retrieval. Specifically, we first design a window-OCR captioner that aggregates dense scene text into OCR captions to facilitate feature interaction. Next, we devise a heterogeneous semantics calibration module that leverages scene text as a self-supervised signal to temporally align window-level OCR captions and frame-level video features. Further, we introduce a context-guided temporal clue distillation module to learn the complementary and relevant details between scene text and video modalities, thereby obtaining discriminative temporal clues for retrieval. Extensive experiments show that our TCD achieves state-of-the-art performance on three scene-text related benchmarks.

TMLR Journal 2025 Journal Article

Federated Generalized Novel Category Discovery with Prompts Tuning

  • Lei Shen
  • Nan Pu
  • Zhun Zhong
  • Mingming Gong
  • Dianhai Yu
  • Chengqi Zhang
  • Bo Han

Generalized category discovery (GCD) is proposed to handle categories from unseen labels during the inference stage by clustering them. Most works in GCD provide solutions for unseen classes in data-centralized settings. However, unlabeled categories possessed by clients, which are common in real-world federated learning (FL), have been largely ignored and degraded the performance of classic FL algorithms. To demonstrate and mitigate the harmful effect of unseen classes, we dive into a GCD problem setting applicable for FL named FedGCD, analyze overfitting problem in FedGCD in detail, establish a strong baseline constructed with state-of-the-art GCD algorithm simGCD, and design a learning framework with prompt tuning to tackle both the overfitting and communication burden problems in FedGCD. In our methods, clients first separately carry out prompt learning on local data. Then, we aggregate the prompts from all clients as the global prompt to help capture global knowledge and then send the global prompts to local clients to allow access to broader knowledge from other clients. By this method, we significantly reduce the parameters needed to upload in FedGCD, which is a common obstacle in the real application of most FL algorithms. We conduct experiments on both generic and fine-grained datasets like CIFAR-100 and CUB-200, and show that our method is comparable to the FL version of simGCD and surpasses other baselines with significantly fewer parameters to transmit.

ICLR Conference 2025 Conference Paper

Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection

  • Lei Shen
  • Zhenheng Tang
  • Lijun Wu
  • Yonggang Zhang 0003
  • Xiaowen Chu 0001
  • Tao Qin
  • Bo Han 0003

Personalized federated learning (PFL) achieves high performance by assuming clients only meet test data locally, which does not meet many generic federated learning (GFL) scenarios. In this work, we theoretically show that PMs can be used to enhance GFL with a new learning problem named Selective FL (SFL), which involves optimizing PFL and model selection. However, storing and selecting whole models requires impractical computation and communication costs. To practically solve SFL, inspired by model components that attempt to edit a sub-model for specific purposes, we design an efficient and effective framework named Hot-Pluggable Federated Learning (HPFL). Specifically, clients individually train personalized plug-in modules based on a shared backbone, and upload them with a plug-in marker on the server modular store. In inference stage, an accurate selection algorithm allows clients to identify and retrieve suitable plug-in modules from the modular store to enhance their generalization performance on the target data distribution. Furthermore, we provide differential privacy protection during the selection with theoretical guarantee. Our comprehensive experiments and ablation studies demonstrate that HPFL significantly outperforms state-of-the-art GFL and PFL algorithms. Additionally, we empirically show HPFL's remarkable potential to resolve other practical FL problems such as continual federated learning and discuss its possible applications in one-shot FL, anarchic FL, and FL plug-in market. Our work is the first attempt towards improving GFL performance through a selecting mechanism with personalized plug-ins.

AAAI Conference 2025 Conference Paper

Towards Unbiased Information Extraction and Adaptation in Cross-Domain Recommendation

  • Yibo Wang
  • Yingchun Jian
  • Wenhao Yang
  • Shiyin Lu
  • Lei Shen
  • Bing Wang
  • Xiaoyi Zeng
  • Lijun Zhang

Cross-Domain Recommendation (CDR) leverages additional knowledge from auxiliary domains to address the long-standing data sparsity issue. However, existing methods typically acquire this knowledge by minimizing the average loss over all domains, overlooking the fact that different domains possess different user-preference distributions. As a result, the acquired knowledge may contain biased information from data-rich domains, leading to performance degradation in data-scarce domains. In this paper, we propose a novel CDR method, which takes domain distinctions into consideration to extract and adapt unbiased information. Specifically, our method consists of two key components: Unbiased Information Extraction (UIE) and Unbiased Information Adaptation (UIA). In the UIE, inspired by distributionally robust optimization, we optimize the worst-case performance across all domains to extract domain-invariant information, preventing the potential bias from auxiliary domains. In the UIA, we introduce a new user-item attention module, which employs domain-specific information from historically interacted items to attend the adaptation of domain-invariant information. To verify the effectiveness of our method, we conduct extensive experiments on three real-world datasets, each of which contains three extremely sparse domains. Experimental results demonstrate the considerable superiority of our proposed method compared to baselines.

AAAI Conference 2024 Conference Paper

HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting

  • Qihe Huang
  • Lei Shen
  • Ruixin Zhang
  • Jiahuan Cheng
  • Shouhong Ding
  • Zhengyang Zhou
  • Yang Wang

Multivariate time series (MTS) prediction has been widely adopted in various scenarios. Recently, some methods have employed patching to enhance local semantics and improve model performance. However, length-fixed patch are prone to losing temporal boundary information, such as complete peaks and periods. Moreover, existing methods mainly focus on modeling long-term dependencies across patches, while paying little attention to other dimensions (e.g., short-term dependencies within patches and complex interactions among cross-variavle patches). To address these challenges, we propose a pure MLP-based HDMixer, aiming to acquire patches with richer semantic information and efficiently modeling hierarchical interactions. Specifically, we design a Length-Extendable Patcher (LEP) tailored to MTS, which enriches the boundary information of patches and alleviates semantic incoherence in series. Subsequently, we devise a Hierarchical Dependency Explorer (HDE) based on pure MLPs. This explorer effectively models short-term dependencies within patches, long-term dependencies across patches, and complex interactions among variables. Extensive experiments on 9 real-world datasets demonstrate the superiority of our approach. The code is available at https://github.com/hqh0728/HDMixer.

AAAI Conference 2024 Conference Paper

PCE-Palm: Palm Crease Energy Based Two-Stage Realistic Pseudo-Palmprint Generation

  • Jianlong Jin
  • Lei Shen
  • Ruixin Zhang
  • Chenglong Zhao
  • Ge Jin
  • Jingyun Zhang
  • Shouhong Ding
  • Yang Zhao

The lack of large-scale data seriously hinders the development of palmprint recognition. Recent approaches address this issue by generating large-scale realistic pseudo palmprints from Bézier curves. However, the significant difference between Bézier curves and real palmprints limits their effectiveness. In this paper, we divide the Bézier-Real difference into creases and texture differences, thus reducing the generation difficulty. We introduce a new palm crease energy (PCE) domain as a bridge from Bézier curves to real palmprints and propose a two-stage generation model. The first stage generates PCE images (realistic creases) from Bézier curves, and the second stage outputs realistic palmprints (realistic texture) with PCE images as input. In addition, we also design a lightweight plug-and-play line feature enhancement block to facilitate domain transfer and improve recognition performance. Extensive experimental results demonstrate that the proposed method surpasses state-of-the-art methods. Under extremely few data settings like 40 IDs (only 2.5% of the total training set), our model achieves a 29% improvement over RPG-Palm and outperforms ArcFace with 100% training set by more than 6% in terms of TAR@FAR=1e-6.

NeurIPS Conference 2023 Conference Paper

CrossGNN: Confronting Noisy Multivariate Time Series Via Cross Interaction Refinement

  • Qihe Huang
  • Lei Shen
  • Ruixin Zhang
  • Shouhong Ding
  • Binwu Wang
  • Zhengyang Zhou
  • Yang Wang

Recently, multivariate time series (MTS) forecasting techniques have seen rapid development and widespread applications across various fields. Transformer-based and GNN-based methods have shown promising potential due to their strong ability to model interaction of time and variables. However, by conducting a comprehensive analysis of the real-world data, we observe that the temporal fluctuations and heterogeneity between variables are not well handled by existing methods. To address the above issues, we propose CrossGNN, a linear complexity GNN model to refine the cross-scale and cross-variable interaction for MTS. To deal with the unexpected noise in time dimension, an adaptive multi-scale identifier (AMSI) is leveraged to construct multi-scale time series with reduced noise. A Cross-Scale GNN is proposed to extract the scales with clearer trend and weaker noise. Cross-Variable GNN is proposed to utilize the homogeneity and heterogeneity between different variables. By simultaneously focusing on edges with higher saliency scores and constraining those edges with lower scores, the time and space complexity (i. e. , $O(L)$) of CrossGNN can be linear with the input sequence length $L$. Extensive experimental results on 8 real-world MTS datasets demonstrate the effectiveness of CrossGNN compared with state-of-the-art methods.

AAAI Conference 2023 Conference Paper

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

  • Meihuizi Jia
  • Lei Shen
  • Xin Shen
  • Lejian Liao
  • Meng Chen
  • Xiaodong He
  • Zhendong Chen
  • Jiaqi Li

Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both text and image. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.

AAAI Conference 2021 Conference Paper

Probing Product Description Generation via Posterior Distillation

  • Haolan Zhan
  • Hainan Zhang
  • Hongshen Chen
  • Lei Shen
  • Zhuoye Ding
  • Yongjun Bao
  • Weipeng Yan
  • Yanyan Lan

In product description generation (PDG), the user-cared aspect is critical for the recommendation system, which can not only improve user’s experiences but also obtain more clicks. High-quality customer reviews can be considered as an ideal source to mine user-cared aspects. However, in reality, a large number of new products (known as long-tailed commodities) cannot gather sufficient amount of customer reviews, which brings a big challenge in the product description generation task. Existing works tend to generate the product description solely based on item information, i. e. , product attributes or title words, which leads to tedious contents and cannot attract customers effectively. To tackle this problem, we propose an adaptive posterior network based on Transformer architecture that can utilize user-cared information from customer reviews. Specifically, we first extend the selfattentive Transformer encoder to encode product titles and attributes. Then, we apply an adaptive posterior distillation module to utilize useful review information, which integrates user-cared aspects to the generation process. Finally, we apply a Transformer-based decoding phase with copy mechanism to automatically generate the product description. Besides, we also collect a large-scare Chinese product description dataset to support our work and further research in this field. Experimental results show that our model is superior to traditional generative models in both automatic indicators and human evaluation.

AAAI Conference 2020 Conference Paper

Accelerating Primal Solution Findings for Mixed Integer Programs Based on Solution Prediction

  • Jian-Ya Ding
  • Chao Zhang
  • Lei Shen
  • Shengyin Li
  • Bing Wang
  • Yinghui Xu
  • Le Song

Mixed Integer Programming (MIP) is one of the most widely used modeling techniques for combinatorial optimization problems. In many applications, a similar MIP model is solved on a regular basis, maintaining remarkable similarities in model structures and solution appearances but differing in formulation coefficients. This offers the opportunity for machine learning methods to explore the correlations between model structures and the resulting solution values. To address this issue, we propose to represent a MIP instance using a tripartite graph, based on which a Graph Convolutional Network (GCN) is constructed to predict solution values for binary variables. The predicted solutions are used to generate a local branching type cut which can be either treated as a global (invalid) inequality in the formulation resulting in a heuristic approach to solve the MIP, or as a root branching rule resulting in an exact approach. Computational evaluations on 8 distinct types of MIP problems show that the proposed framework improves the primal solution finding performance significantly on a state-of-the-art open-source MIP solver.