Arrow Research search

Author name cluster

Xin He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

AAAI Conference 2026 Conference Paper

Dual Mamba for Node-Specific Representation Learning: Tackling Over-Smoothing with Selective State Space Modeling

  • Xin He
  • Yili Wang
  • Yiwei Dai
  • Xin Wang

Over-smoothing remains a fundamental challenge in deep Graph Neural Networks (GNNs), where repeated message passing causes node representations to become indistinguishable. While existing solutions, such as residual connections and skip layers, alleviate this issue to some extent, they fail to explicitly model how node representations evolve in a node-specific and progressive manner across layers. Moreover, these methods do not take global information into account, which is also crucial for mitigating the over-smoothing problem. To address the aforementioned issues, in this work, we propose a Dual Mamba-enhanced Graph Convolutional Network (DMbaGCN), which is a novel framework that integrates Mamba into GNNs to address over-smoothing from both local and global perspectives. DMbaGCN consists of two modules: the Local State-Evolution Mamba (LSEMba) for local neighborhood aggregation and utilizing Mamba’s selective state space modeling to capture node-specific representation dynamics across layers, and the Global Context-Aware Mamba (GCAMba) that leverages Mamba’s global attention capabilities to incorporate global context for each node. By combining these components, DMbaGCN enhances node discriminability in deep GNNs, thereby mitigating over-smoothing. Extensive experiments on multiple benchmarks demonstrate the effectiveness and efficiency of our method.

AAAI Conference 2025 Conference Paper

Boosting Segment Anything Model Towards Open-Vocabulary Learning

  • Xumeng Han
  • Longhui Wei
  • Xuehui Yu
  • Zhiyang Dou
  • Xin He
  • Kuiran Wang
  • Yingfei Sun
  • Zhenjun Han

The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model, showcasing potent zero-shot generalization and flexible prompting. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. In this paper, we present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework. While retaining all the remarkable capabilities inherent to SAM, we boost it to detect arbitrary objects from human inputs like category names or reference expressions. Building upon the SAM image encoder, we introduce a novel SideFormer module designed to acquire SAM features adept at perceiving objects and inject comprehensive semantic information for recognition. In addition, we devise an Open-set RPN that leverages SAM proposals to assist in finding potential objects. Consequently, Sambor enables the open-vocabulary detector to equally focus on generalizing both localization and classification sub-tasks. Our approach demonstrates superior zero-shot performance across benchmarks, including COCO and LVIS, proving highly competitive against previous state-of-the-art methods. We aspire for this work to serve as a meaningful endeavor in endowing SAM to recognize diverse object categories and advancing open-vocabulary learning with the support of vision foundation models.

JAIR Journal 2025 Journal Article

Graph Collaborative Filtering Model Combining Time Factor and Attention Mechanism

  • Xianglin Zuo
  • Xin He
  • Tianhao Jia
  • Ying Wang

Recently, with the triumph of deep learning, attention mechanism, and graph convolutional networks in their respective fields, using new representation learning techniques or introducing auxiliary information to improve the representation ability of embedding has become the core content of the recommendation algorithm research. Generally, most existing GNN-based recommendation methods recursively propagate embedding information on the graph structure and capture collaborative signals by exploring the high-level connectivity between users and items. Despite the great success, those methods do not consider the influence of temporal context on user preferences embedding information propagation, nor do they distinguish the contribution of different neighbor node information to the target node. In order to address the two problems, we propose a graph collaborative filtering model TAGCF combing time factors and attention based on the existing method. The model uses the time factor to integrate temporal information into the process of embedding information propagation and uses the attention mechanism to distinguish the influence of embedding information from different neighbors. The effectiveness of TAGCF, time information, and attention mechanism are verified through comparative experiments with multiple baseline methods on the two recommendation system datasets, MovieLens and Amazon-books.

AAAI Conference 2025 Conference Paper

IMAGDressing-v1: Customizable Virtual Dressing

  • Fei Shen
  • Xin Jiang
  • Xin He
  • Hu Ye
  • Cong Wang
  • Xiaoyu Du
  • Zechao Li
  • Jinhui Tang

Existing virtual try-on (VTON) methods provide only limited user control over garment attributes and generally overlook essential factors such as face, pose, and scene context. To address these limitations, we introduce the virtual dressing (VD) task, which aims to synthesize freely editable human images conditioned on fixed garments and optional user-defined inputs. We further propose a comprehensive affinity metric index (CAMI) to quantify the consistency between generated outputs and reference garments. We present IMAGDressing-v1, which leverages a garment-specific U-Net to integrate semantic features from CLIP and texture features from a VAE. To incorporate these garment features into a frozen denoising U-Net for flexible text-driven scene control, we employ a hybrid attention mechanism composed of frozen self-attention and trainable cross-attention layers. IMAGDressing-v1 seamlessly integrates with extension modules, such as ControlNet and IP-Adapter, enabling enhanced diversity and controllability. To alleviate data constraints, we introduce the Interactive Garment Pairing (IGPair) dataset, comprising over 300,000 garment–image pairs and a standardized data assembly pipeline. Extensive experiments demonstrate that IMAGDressing-v1 achieves state-of-the-art performance in controlled human image synthesis. The code and model will be available at https://github.com/muzishen/IMAGDressing.

IJCAI Conference 2025 Conference Paper

Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models

  • Xin He
  • Longhui Wei
  • Lingxi Xie
  • Qi Tian

Multimodal Large Language Models (MLLMs) are experiencing rapid growth, yielding a plethora of novel works recently. The prevailing trend involves adopting data-driven methodologies, wherein diverse instruction-following datasets were collected. However, these approaches always face the challenge of limited visual perception capabilities, as they solely utilizing CLIP-like encoders to extract visual information from inputs. Though these encoders are pre-trained on billions of image-text pairs, they still grapple with the information loss dilemma, given that textual captions only partially capture the contents depicted in images. To address this limitation, this paper proposes to improve the visual perception ability of MLLMs through a mixture-of-experts knowledge enhancement mechanism. Specifically, this work introduces a novel method that incorporates multi-task encoders and existing visual tools into the MLLMs training and inference pipeline, aiming to provide a more comprehensive summarization of visual inputs. Extensive experiments have evaluated its effectiveness of advancing MLLMs, showcasing improved visual perception capability achieved through the integration of visual experts.

IJCAI Conference 2025 Conference Paper

Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space

  • Xin He
  • Yili Wang
  • Wenqi Fan
  • Xu Shen
  • Xin Juan
  • Rui Miao
  • Xin Wang

Graph Neural Networks (GNNs) have shown great success in various graph-based learning tasks. However, it often faces the issue of over-smoothing as the model depth increases, which causes all node representations to converge to a single value and become indistinguishable. This issue stems from the inherent limitations of GNNs, which struggle to distinguish the importance of information from different neighborhoods. In this paper, we introduce MbaGCN, a novel graph convolutional architecture that draws inspiration from the Mamba paradigm—originally designed for sequence modeling. MbaGCN presents a new backbone for GNNs, consisting of three key components: the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components work in tandem to adaptively aggregate neighborhood information, providing greater flexibility and scalability for deep GNN models. While MbaGCN may not consistently outperform all existing methods on each dataset, it provides a foundational framework that demonstrates the effective integration of the Mamba paradigm into graph representation learning. Through extensive experiments on benchmark datasets, we demonstrate that MbaGCN paves the way for future advancements in graph neural network research. Our code is in https: //github. com/hexin5515/MbaGCN.

JBHI Journal 2025 Journal Article

pscAdapt: Pre-Trained Domain Adaptation Network Based on Structural Similarity for Cell Type Annotation in Single Cell RNA-seq Data

  • Yan Zhao
  • Junliang Shang
  • Baojuan Qin
  • Limin Zhang
  • Xin He
  • Daohui Ge
  • Qianqian Ren
  • Jin-Xing Liu

Cell type annotation refers to the process of categorizing and labeling cells to identify their specific cell types, which is crucial for understanding cell functions and biological processes. Although many methods have been developed for automated cell type annotation, they often encounter challenges such as batch effects due to variations in data distribution across platforms and species, thereby compromising their performance. To address batch effects, in this study, a pre-trained domain adaptation model based on structural similarity, named pscAdapt, is proposed for cell type annotation. Specifically, a pre-trained strategy is employed to initialize model parameters to learn the data distribution of source domain. This strategy is also combined with an adversarial learning strategy to train the domain adaptation network for achieving domain level alignment and reducing domain discrepancy. Furthermore, to better distinguish different types of cells, a structural similarity loss is designed, aiming to shorten distances between cells of the same type and increase distances between cells of different types in feature space, thus achieving cell level alignment and enhancing the discriminability of cell types. Comprehensive experiments were conducted on simulated datasets, cross-platforms datasets and cross-species datasets to validate the effectiveness of pscAdapt, results of which demonstrate that pscAdapt outperforms several popular cell type annotation methods.

JMLR Journal 2024 Journal Article

Deep Nonparametric Quantile Regression under Covariate Shift

  • Xingdong Feng
  • Xin He
  • Yuling Jiao
  • Lican Kang
  • Caixing Wang

This work focuses on addressing the challenges posed by covariate shift in nonparametric quantile regression using deep neural networks. We propose a two-stage pre-training reweighted method that leverages importance weighting to mitigate the effects of distribution shift. In the first stage, density ratios are estimated with a neural network by minimizing least squares. In the second stage, a deep neural network estimator is obtained using pre-training weights. Theoretical analysis is provided, offering non-asymptotic error bounds for the unweighted, reweighted, and pre-training reweighted estimators. We consider scenarios with both bounded and unbounded density ratios. Notably, we employ a novel proof technique to bound the generalization error, characterized by the size and weights bound of ReLU neural networks. This enables us to establish fast rates of convergence under the adaptive self-calibration condition, distinguishing our approach from those relying on local Rademacher complexity techniques. Additionally, we derive the approximation error with weight bounds for ReLU neural networks approximating the Hölder class. Our theoretical findings provide valuable insights for the pre-training process and highlight the efficacy of reweighted techniques. Numerical experiments are conducted to further validate the theoretical findings and demonstrate the effectiveness of our proposed method. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

On the Target-kernel Alignment: a Unified Analysis with Kernel Complexity

  • Chao Wang
  • Xin He
  • Yuwen Wang
  • Junhui Wang

This paper investigates the impact of alignment between the target function of interest and the kernel matrix on a variety of kernel-based methods based on a general loss belonging to a rich loss function family, which covers many commonly used methods in regression and classification problems. We consider the truncated kernel-based method (TKM) which is estimated within a reduced function space constructed by using the spectral truncation of the kernel matrix and compare its theoretical behavior to that of the standard kernel-based method (KM) under various settings. By using the kernel complexity function that quantifies the complexity of the induced function space, we derive the upper bounds for both TKM and KM, and further reveal their dependencies on the degree of target-kernel alignment. Specifically, for the alignment with polynomial decay, the established results indicate that under the just-aligned and weakly-aligned regimes, TKM and KM share the same learning rate. Yet, under the strongly-aligned regime, KM suffers the saturation effect, while TKM can be continuously improved as the alignment becomes stronger. This further implies that TKM has a strong ability to capture the strong alignment and provide a theoretically guaranteed solution to eliminate the phenomena of saturation effect. The minimax lower bound is also established for the squared loss to confirm the optimality of TKM. Extensive numerical experiments further support our theoretical findings. The Python code for reproducing the numerical experiments is available at https: //github. com/wywangen.

IJCAI Conference 2024 Conference Paper

SAEIR: Sequentially Accumulated Entropy Intrinsic Reward for Cooperative Multi-Agent Reinforcement Learning with Sparse Reward

  • Xin He
  • Hongwei Ge
  • Yaqing Hou
  • Jincheng Yu

Multi-agent reinforcement learning (MARL) performs well for solving complex cooperative tasks when the scenarios have well-defined dense rewards. However, there are usually sparse reward settings in many real-world multi-agent systems, which makes it difficult for MARL algorithms to successfully learn an effective strategy. To tackle this problem, we propose a novel sequentially accumulated entropy intrinsic reward named SAEIR, which utilizes the entropy of multi-agent system as a bonus to accelerate learning. Specifically, the multi-scale hypergraph critic is proposed to obtain high-order system state representation, which also enhances the ability to effectively evaluate the action produced by the actor. Based on the comprehensive and compact system state representation, the orderliness of multi-agent systems can be measured to determine the highly valuable states for adding entropy-based intrinsic rewards which leads to a highly efficient learning process. Empirical results demonstrate that our proposed method achieves state-of-the-art performance in several complex cooperative multi-agent environments with sparse reward settings.

JBHI Journal 2024 Journal Article

SGFCCDA: Scale Graph Convolutional Networks and Feature Convolution for circRNA-Disease Association Prediction

  • Junliang Shang
  • Linqian Zhao
  • Xin He
  • Xianghan Meng
  • Limin Zhang
  • Daohui Ge
  • Feng Li
  • Jin-Xing Liu

Circular RNAs (circRNAs) have emerged as a novel class of non-coding RNAs with regulatory roles in disease pathogenesis. Computational models aimed at predicting circRNA-disease associations offer valuable insights into disease mechanisms, thereby enabling the development of innovative diagnostic and therapeutic approaches while reducing the reliance on costly wet experiments. In this study, SGFCCDA is proposed for predicting potential circRNA-disease associations based on scale graph convolutional networks and feature convolution. Specifically, SGFCCDA integrates multiple measures of circRNA and disease similarity and combines known association information to construct a heterogeneous network. This network is then explored by scale graph convolutional networks to capture both topological and attribute information. Additionally, convolutional neural networks are employed to further learn the features and obtain higher-order feature representations containing richer information about nodes. The Hadamard product is utilized to effectively combine circRNA features with disease features, and a multilayer perceptron is applied to predict the association between each pair of circRNA and disease. Five-fold cross validation experiments conducted on the CircR2Disease dataset demonstrate the accurate prediction capabilities of SGFCCDA in identifying potential circRNA-disease associations. Furthermore, case studies provide further confirmation of SGFCCDA's ability to identify disease-associated circRNAs.

ICML Conference 2024 Conference Paper

Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features

  • Chao Wang
  • Xin Bing
  • Xin He
  • Caixing Wang

Random feature (RF) mapping is an attractive and powerful technique for solving large-scale nonparametric regression. Yet, the existing theoretical analysis crucially relies on the i. i. d. assumption that individuals in the data are independent and identically distributed. It is still unclear whether learning accuracy would be compromised when the i. i. d. assumption is violated. This paper aims to provide theoretical understanding of the kernel ridge regression (KRR) with RFs for large-scale dependent data. Specifically, we consider two types of data dependence structure, namely, the $\tau$-mixing process with exponential decay coefficient, and that with polynomial decay coefficient. Theoretically, we prove that the kernel ridge estimator with RFs achieves the minimax optimality under the exponential decay scenario, but yields a sub-optimal result under the polynomial decay case. Our analysis further reveals how the decay rate of the $\tau$-mixing coefficient impacts the learning accuracy of the kernel ridge estimator with RFs. Extensive numerical experiments on both synthetic and real examples further validate our theoretical findings and support the effectiveness of the KRR with RFs in dealing with dependent data.

NeurIPS Conference 2023 Conference Paper

Implicit Regularization in Over-Parameterized Support Vector Machine

  • Yang Sui
  • Xin He
  • Yang Bai

In this paper, we design a regularization-free algorithm for high-dimensional support vector machines (SVMs) by integrating over-parameterization with Nesterov's smoothing method, and provide theoretical guarantees for the induced implicit regularization phenomenon. In particular, we construct an over-parameterized hinge loss function and estimate the true parameters by leveraging regularization-free gradient descent on this loss function. The utilization of Nesterov's method enhances the computational efficiency of our algorithm, especially in terms of determining the stopping criterion and reducing computational complexity. With appropriate choices of initialization, step size, and smoothness parameter, we demonstrate that unregularized gradient descent achieves a near-oracle statistical convergence rate. Additionally, we verify our theoretical findings through a variety of numerical experiments and compare the proposed method with explicit regularization. Our results illustrate the advantages of employing implicit regularization via gradient descent in conjunction with over-parameterization in sparse SVMs.

JMLR Journal 2023 Journal Article

Kernel-based estimation for partially functional linear model: Minimax rates and randomized sketches

  • Shaogao Lv
  • Xin He
  • Junhui Wang

This paper considers the partially functional linear model (PFLM) where all predictive features consist of a functional covariate and a high dimensional scalar vector. Over an infinite dimensional reproducing kernel Hilbert space, the proposed estimation for PFLM is a least square approach with two mixed regularizations of a function-norm and an $\ell_1$-norm. Our main task in this paper is to establish the minimax rates for PFLM under high dimensional setting, and the optimal minimax rates of estimation are established by using various techniques in empirical process theory for analyzing kernel classes. In addition, we propose an efficient numerical algorithm based on randomized sketches of the kernel matrix. Several numerical experiments are implemented to support our method and optimization strategy. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2023 Conference Paper

NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

  • Xin He
  • Jiangchao Yao
  • Yuxin Wang
  • Zhenheng Tang
  • Ka Chun Cheung
  • Simon See
  • Bo Han
  • Xiaowen Chu

One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i.e., subnet). However, the inconsistency of characteristics among subnets incurs serious interference in the optimization, resulting in poor performance ranking correlation of subnets. Subsequent explorations decompose supernet weights via a particular criterion, e.g., gradient matching, to reduce the interference; yet they suffer from huge computational cost and low space separability. In this work, we propose a lightweight and effective local intrinsic dimension (LID)-based method NAS-LID. NAS-LID evaluates the geometrical properties of architectures by calculating the low-cost LID features layer-by-layer, and the similarity characterized by LID enjoys better separability compared with gradients, which thus effectively reduces the interference among subnets. Extensive experiments on NASBench-201 indicate that NAS-LID achieves superior performance with better efficiency. Specifically, compared to the gradient-driven method, NAS-LID can save up to 86% of GPU memory overhead when searching on NASBench-201. We also demonstrate the effectiveness of NAS-LID on ProxylessNAS and OFA spaces. Source code:https://github.com/marsggbo/NAS-LID.

NeurIPS Conference 2023 Conference Paper

Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift

  • Xingdong Feng
  • Xin He
  • Caixing Wang
  • Chao Wang
  • Jingnan Zhang

Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.

NeurIPS Conference 2022 Conference Paper

Fine-Grained Semantically Aligned Vision-Language Pre-Training

  • Juncheng Li
  • Xin He
  • Longhui Wei
  • Long Qian
  • Linchao Zhu
  • Lingxi Xie
  • Yueting Zhuang
  • Qi Tian

Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and text, or advanced cross-modal attention upon image and text features. However, they fail to explicitly learn the fine-grained semantic alignment between visual regions and textual phrases, as only global image-text alignment information is available. In this paper, we introduce LOUPE, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions. To efficiently estimate the game-theoretic interactions, we further propose an uncertainty-aware neural Shapley interaction learning module. Experiments show that LOUPE achieves state-of-the-art performance on a variety of vision-language tasks. Without any object-level human annotations and fine-tuning, LOUPE achieves competitive performance on object detection and visual grounding. More importantly, LOUPE opens a new promising direction of learning fine-grained semantics from large-scale raw image-text pairs.

JMLR Journal 2022 Journal Article

Learning linear non-Gaussian directed acyclic graph with diverging number of nodes

  • Ruixuan Zhao
  • Xin He
  • Junhui Wang

An acyclic model, often depicted as a directed acyclic graph (DAG), has been widely employed to represent directional causal relations among collected nodes. In this article, we propose an efficient method to learn linear non-Gaussian DAG in high dimensional cases, where the noises can be of any continuous non-Gaussian distribution. The proposed method leverages the concept of topological layer to facilitate the DAG learning, and its theoretical justification in terms of exact DAG recovery is also established under mild conditions. Particularly, we show that the topological layers can be exactly reconstructed in a bottom-up fashion, and the parent-child relations among nodes can also be consistently established. The established asymptotic DAG recovery is in sharp contrast to that of many existing learning methods assuming parental faithfulness or ordered noise variances. The advantage of the proposed method is also supported by the numerical comparison against some popular competitors in various simulated examples as well as a real application on the global spread of COVID-19. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

AAAI Conference 2021 Conference Paper

Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans

  • Xin He
  • Shihao Wang
  • Xiaowen Chu
  • Shaohuai Shi
  • Jiangping Tang
  • Xin Liu
  • Chenggang Yan
  • Jiyong Zhang

The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people’s lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3 18) to establish the baseline performance on three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search the 3D DL models for 3D chest CT scans classification and use the Gumbel Softmax technique to improve the search efficiency. We further exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our searched models (CovidNet3D) outperform the baseline human-designed models on three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID- 19 datasets to provide interpretability for medical diagnosis. Code: https: //github. com/HKBU-HPML/CovidNet3D.