Arrow Research search

Author name cluster

Lei Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

57 papers
2 author rows

Possible papers

57

EAAI Journal 2026 Journal Article

A semi-supervised multimodal fusion framework with adversarial contrastive learning for Alzheimer's disease diagnosis

  • Haowen Liu
  • Lei Shi
  • Yucheng Shi
  • Yameng Zhang
  • Guohua Zhao
  • Yufei Gao

Multimodal neuroimaging data provide complementary structural and functional information for Alzheimer's disease (AD) diagnosis. However, acquiring a large-scale multimodal dataset with precise annotations is resource-intensive and incurs substantial costs. Effectively fusing relevant information across modalities also remains a significant challenge. Existing methods are often constrained by modeling intra-modality specific features and inter-modal shared information in isolation, overlooking their critical interactions. Additionally, the heterogeneity among modalities introduces a major obstacle to achieving reliable cross-modality feature alignment. To tackle these problems, we propose an end-to-end semi-supervised multimodal fusion framework with adversarial contrastive learning. Specifically, a pseudo-labeling strategy is designed to make full use of unlabeled data by regulating the quality of pseudo-labels with a threshold. To adaptively capture inter-modal shared characteristics while preserving the unique properties of intra-modality, we design a Dual-phase Inter–Intra Attention Fusion Unit that effectively exploits the interactions between different modalities. Moreover, to achieve efficient alignment of multimodal data at both the feature and subject levels, we develop a hierarchical alignment strategy based on adversarial contrastive learning. This strategy maps features into a shared latent space and promotes the proximity of inter-modal paired samples within that space, thereby simultaneously mitigating distributional discrepancies and resolving semantic inconsistencies across modalities. Extensive experiments conducted on two independent public datasets demonstrate that the proposed framework performs excellently in AD diagnosis compared with existing approaches, notably achieving an accuracy of 92. 00 (±1. 00)% on ADNI1 with only 40% labels.

AAAI Conference 2026 Conference Paper

Adaptive Graph Attention Based Discrete Hashing for Incomplete Cross-modal Retrieval

  • Shuang Zhang
  • Yue Wu
  • Lei Shi
  • Huilong Jin
  • Feifei Kou
  • Pengfei Zhang
  • Mingying Xu
  • Pengtao Lv

Cross-modal hashing has emerged as a pivotal solution for efficient retrieval across diverse modalities, such as images and texts, by mapping them into compact binary hash spaces. However, in real-world scenarios, the modalities data is often missing or misaligned. Existing methods are most rely on fully paired training data and ignore missing or misaligned modalities data, resulting in the semantic inconsistencies. To address these challenges, we propose an Adaptive Graph Attention-Based Discrete Hashing (AGADH) method, which consists of three parts. First, to solve the problem of missing modalities, AGADH employs a masked completion strategy to reconstruct missing modalities. Second, to mitigate semantic misalignment, AGADH leverages a Graph Attention Network (GAT) encoder-decoder architecture with alignment module to construct features from different modalities. Additionally, to enhance the fusion performance, an adaptive fusion module dynamically adjusting the contributions of image and text modalities with learnable weighting coefficients is proposed. Extensive experiments on three benchmark datasets, MS-COCO, NUS-WIDE, and MIRFlickr-25K, demonstrating that AGADH outperforms state-of-the-art methods in both fully paired and incompletely paired scenarios, showing its robustness and effectiveness in cross-modal retrieval tasks.

AAAI Conference 2026 Short Paper

Constraint-Augmented Mongolian-Chinese Neural Machine Translation Based on Dynamic Feedback Alignment (Student Abstract)

  • Shuting Dai
  • Yatu Ji
  • Yanli Wang
  • Lei Shi
  • Qing-Dao-Er-Ji Ren
  • Nier Wu
  • Na Liu

The scarcity of parallel corpora for Mongolian and Chinese constrains the performance of Mongolian-Chinese neural machine translation (NMT), particularly manifesting in inadequate accuracy in translating specialized terminology. To address this limitation, this study adopts a lexically constrained augmentation strategy that constructs pseudo-source sentences by appending Chinese constraint words to Mongolian source texts, while enforcing the inclusion of these constraints in the output to improve translation accuracy. However, this approach presents two inherent drawbacks: processing pseudo-sentences with a single encoder tends to induce semantic interference, while the introduced constraint words may exacerbate alignment errors during decoding. To overcome these limitations, this paper propose a Constraint-Augmented Mongolian-Chinese NMT method (CANMT) based on dynamic feedback alignment. The method employs a dual-encoder architecture to isolate bilingual representations, coupled with a dynamic feedback alignment module that progressively reduces alignment errors through iterative reffnement, thereby enhancing overall translation performance.

AAAI Conference 2026 Conference Paper

MoE-Guided Graph Diffusion for Oriented Molecule Design

  • Shuochen Li
  • Xiangqi Guo
  • Huobin Tan
  • Lei Shi

Designing molecules with desired properties, aka the oRiented molEcule Design (RED), is a fundamental task in chemistry and materials science. While graph diffusion models (GDMs) and reinforcement learning techniques (RL) show promise in molecule structure generation and property optimization stages individually, their integration in the unified RED task often suffers from poor compatibility. The large variance among candidate molecular structures generated by GDMs can be amplified in the iterative optimization process of RL, leading to slow and unstable convergence. In this work, motivated by the adaptive and divide-and-conquer characteristics of Mixture of Experts (MoE) architecture, we propose a novel framework called MoE-Guided Graph Diffusion Model (MEGD) that incorporates the MoE architecture to guide the orchestration of GDM and RL, promoting faster and more stable convergence in the design process. MEGD is evaluated on benchmark datasets optimizing the physical and chemical properties of AI-generated molecular structures. On all three datasets, our method outperforms the best of 9 alternative models by 7.73% on the target structural properties, while not penalizing other important application-level quality metrics of the generated molecules. A real-world case study on an emerging class of material, i.e., metal-organic framework, is also conducted, which further demonstrates the effectiveness of our method in accomplishing the RED task.

EAAI Journal 2026 Journal Article

Municipal solid waste gasification predictions using hybrid-data physics-informed neural networks

  • Vincentius Surya Kurnia Adi
  • Lei Shi
  • Wei Wu

Municipal solid waste (MSW) gasification presents a viable thermochemical pathway for waste valorization and sustainable energy generation. Complicated mathematical modeling, nonlinear behavior, and heterogeneous feedstocks cannot ensure a feasible prediction of their outlet compositions. This study presents a hybrid-data physics-informed neural network (PINN)—a physics-guided machine learning (ML) approach within artificial intelligence (AI)—to predict syngas composition and lower heating value (LHV). The framework integrates experimental measurements, Aspen Plus simulation, and thermodynamic monotonicity constraints with respect to temperature, moisture content, and equivalence ratio. Aspen Plus models were constructed using equilibrium chemistry and the Peng-Robinson equation of state (EOS) with Boston-Mathias modifications (PR-BM) property method, and monotonicity constraints concerning temperature, moisture content, and equivalence ratio are embedded into the PINN loss function to enforce physical consistency. To address the feasible and effective PINN framework, three model scenarios of a pure data-driven PINN with a monotonicity loss term, a hybrid data-driven PINN with a monotonicity loss term, and a hybrid data-driven PINN with three physics-informed loss terms are proposed. Through the benchmark comparisons, the hybrid data-driven PINN with three physics-informed loss terms outperforms the conventional machine learning (ML) models such as Support Vector Machines (SVM), Extreme Gradient Boosting (XGB), and other PINN scenarios due to highest coefficient of determination R2 values of carbon dioxide (CO) (0. 9600), LHV (0. 9535), and nitrogen (N2) (0. 9650). It shows that the proposed approach offers enhanced accuracy, generalizability, and interpretability—supporting data-driven decision-making in MSW gasification and sustainable energy system design.

AAAI Conference 2026 Conference Paper

MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation

  • Yuqiu Zhao
  • Lei Shi
  • Yan Zhong
  • Feifei Kou
  • Pengfei Zhang
  • Jiwei Zhang
  • Mingying Xu
  • Yanchao Liu

Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: (1) Learning decoupled multi-modal features limits the quality of the semantic representation. (2) Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a multi-modal semantic-enhanced identifier with collaborative signals for generative recommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.

ICRA Conference 2025 Conference Paper

A Fairness-Oriented Control Framework for Safety-Critical Multi-Robot Systems: Alternative Authority Control

  • Lei Shi
  • Qichao Liu
  • Cheng Zhou
  • Xiong Li 0001

This paper proposes a fair control framework for multi-robot systems, which integrates the newly introduced Alternative Authority Control (AAC) and Flexible Control Barrier Function (F-CBF). Control authority refers to a single robot which can plan its trajectory while considering others as moving obstacles, meaning the other robots do not have authority to plan their own paths. The AAC method dynamically distributes the control authority, enabling fair and coordinated movement across the system. This approach significantly improves computational efficiency, scalability, and robustness in complex environments. The proposed F-CBF extends traditional CBFs by incorporating obstacle shape, velocity, and orientation. FCBF enhances safety by accurate dynamic obstacle avoidance. The framework is validated through simulations in multi-robot scenarios, demonstrating its safety, robustness and computational efficiency.

NeurIPS Conference 2025 Conference Paper

Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

  • Shuang Zhang
  • Yue Wu
  • Lei Shi
  • Yingxue Zhang
  • Feifei Kou
  • Huilong Jin
  • Pengfei Zhang
  • Meiyu Liang

The demand for multimodal data processing drives the development of information technology. Cross-modal hash retrieval has attracted much attention because it can overcome modal differences and achieve efficient retrieval, and has shown great application potential in many practical scenarios. Existing cross-modal hashing methods have difficulties in fully capturing the semantic information of different modal data, which leads to a significant semantic gap between modalities. Moreover, these methods often ignore the importance differences of channels, and due to the limitation of a single goal, the matching effect between hash codes is also affected to a certain extent, thus facing many challenges. To address these issues, we propose a Dynamic Masking and Auxiliary Hash Learning (AHLR) method for enhanced cross-modal retrieval. By jointly leveraging the dynamic masking and auxiliary hash learning mechanisms, our approach effectively resolves the problems of channel information imbalance and insufficient key information capture, thereby significantly improving the retrieval accuracy. Specifically, we introduce a dynamic masking mechanism that automatically screens and weights the key information in images and texts during the training process, enhancing the accuracy of feature matching. We further construct an auxiliary hash layer to adaptively balance the weights of features across each channel, compensating for the deficiencies of traditional methods in key information capture and channel processing. In addition, we design a contrastive loss function to optimize the generation of hash codes and enhance their discriminative power, further improving the performance of cross-modal retrieval. Comprehensive experimental results on NUS-WIDE, MIRFlickr-25K and MS-COCO benchmark datasets show that the proposed AHLR algorithm outperforms several existing algorithms.

EAAI Journal 2025 Journal Article

Enable importance-aware model cacheability for inference serving

  • Hao Mo
  • Didier El Baz
  • Ligu Zhu
  • Suping Wang
  • Songfu Tan
  • Hongning Zhao
  • Lei Shi

Inference serving systems are leveraged to deploy deep learning (DL) models as services. Accelerators such as Graphics Processing Units (GPUs) have been extensively used in these systems to reduce model execution time. As accelerators become more powerful and expensive, GPU sharing among DL models across different inference requests is a common practice. However, GPU memory capacity becomes a bottleneck when the number of collocated models increases, making this approach unsustainable. At the same time, collocated models may vary in popularity levels — some are accessed frequently and others are not, leading to low resource efficiency and system performance. While some existing inference serving systems offer the capability to dynamically load and cache model in memory, they are typically locality-aware and exhibit poor performance for DL inference serving. To this end, we present mCache, a novel inference serving-oriented caching system to dynamically manage a set of collocated models with diverse popularity for efficient use of memory. mCache treats each DL model as a cacheable object, and loads model on demand and unloads models when not in use. Rather than using recency or frequency, we manage models in GPU memory based on rank-based importance scores, which jointly consider cache access patterns and model-specific factors, serving as a unified metric to compare different models. During model serving, the importance scores of cached models are dynamically updated, and the least important model is evicted to make room for a newly targeted model when the cache is full. Evaluation with representative DL models shows that mCache reduces memory footprint by nearly a half with a modest inference latency increase. Compared to existing serving system using Least Frequently Used (LFU) caching algorithm, mCache improves throughput by up to 1. 5 × and 2. 39 × given the 40% and 80% GPU memory capacity.

IJCAI Conference 2025 Conference Paper

EVICheck: Evidence-Driven Independent Reasoning and Combined Verification Method for Fact-Checking

  • Lingxiao Wang
  • Lei Shi
  • Feifei Kou
  • Ligu Zhu
  • Chen Ma
  • Pengfei Zhang
  • Mingying Xu
  • Zeyu Li

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have demonstrated significant potential in automated fact-checking. However, existing methods face limitations in insufficient evidence utilization and lack of explicit verification criteria. Specifically, these approaches aggregate evidence for collective reasoning without independently analyzing each piece, hindering their ability to leverage the available information thoroughly. Additionally, they rely on simple prompts or few-shot learning for verification, which makes truthfulness judgments less reliable, especially for complex claims. To address these limitations, we propose a novel method to enhance evidence utilization and introduce explicit verification criteria, named EVICheck. Our approach independently reasons each evidence piece and synthesizes the results to enable more thorough exploration and enhance interpretability. Additionally, by incorporating fine-grained truthfulness criteria, we make the model's verification process more structured and reliable, especially when handling complex claims. Experimental results on the public RAWFC dataset demonstrate that EVICheck achieves state-of-the-art performance across all evaluation metrics. Our method demonstrates strong potential in fake news verification, significantly improving the accuracy.

AAAI Conference 2025 Conference Paper

IWRN:A Robust Blind Watermarking Method for Artwork Image Copyright Protection Against Noise Attack

  • Feifei Kou
  • Yuhan Yao
  • Siyuan Yao
  • Jiahao Wang
  • Lei Shi
  • Yawen Li
  • Xuejing Kang

Adding imperceptible watermarks to artwork images, such as paintings and photographs, can effectively safeguard the copyright of these images without compromising their usability. However, existing blind watermarking techniques encounter two major challenges in addressing this task: imperceptibility and robustness, particularly when subjected to various noise attacks. In this paper, we propose a blind watermarking method for artwork image copyright protection, IWRN, which can ensure both the Imperceptibility of the Watermark and Robustness against Noise attacks. For imperceptibility, we design a Learnable Wavelet Network (LWN) to adaptively embed the watermark into the high-frequency region where the watermark has better invisibility. For robustness, we establish a Deform-Attention based Invertible Neural Network (DA-INN) with a decoding optimization, which offers the advantage of computational reversion, and combines the deform-attention mechanism and decoding optimization to enhance the model's resistance against noises. Additionally, we design a Joint Contrast Learning (JCL) mechanism to improve imperceptibility and robustness simultaneously. Experiments show that our IWRN outperforms other state-of-the-art blind watermarking methods, achieves an average performance of 41.55 PSNR and 99.57% accuracy on the Coco2017, Wikiart, and Div2k datasets when facing 12 kinds of noise attacks.

NeurIPS Conference 2025 Conference Paper

Learning to Rank for In-Context Example Retrieval

  • Yuwen Ji
  • Luodan Zhang
  • Ambyer han
  • Haoran Que
  • Lei Shi
  • Wang Chao
  • Yue Zhang

Recent advances in retrieval-based in-context learning (ICL) train the retriever using a classification objective, which categorizes in-context examples (ICEs) into the most useful and the rest based on absolute scores. However, during inference, ICEs are retrieved by score ranking rather than classification — The classification training objective deviates from this test scenario. Hence, in this paper, we propose a novel algorithm that trains a retrieval model by ranking formulation, where the preference rankings between ICEs are given by comparing the likelihood of the LLM generating the correct answer conditioned on each exemplar. By learning to rank, we motivate the retriever to automatically learn diverse rationales why specific examples are more useful for ICL decisions. This addresses the issue that classification models poorly capture broader utility. Experimental results demonstrate the top-1 performance of our proposal across 9 NLP tasks, with ablation studies and case studies further validating the effectiveness of our design. The code can be found in: https: //github. com/2022neo/SeDPO_NIPS25

NeurIPS Conference 2025 Conference Paper

Leveraging semantic similarity for experimentation with AI-generated treatments

  • Lei Shi
  • David Arbour
  • Raghavendra Addanki
  • Ritwik Sinha
  • Avi Feller

Large Language Models (LLMs) enable a new form of digital experimentation where treatments combine human and model-generated content in increasingly sophisticated ways. The main methodological challenge in this setting is representing these high-dimensional treatments without losing their semantic meaning or rendering analysis intractable. Here we address this problem by focusing on learning low-dimensional representations that capture the underlying structure of such treatments. These representations enable downstream applications such as guiding generative models to produce meaningful treatment variants and facilitating adaptive assignment in online experiments. We propose double kernel representation learning, which models the causal effect through the inner product of kernel-based representations of treatments and user covariates. We develop an alternating-minimization algorithm that learns these representations efficiently from data and provide convergence guarantees under a low-rank factor model. As an application of this framework, we introduce an adaptive design strategy for online experimentation and demonstrate the method's effectiveness through numerical experiments.

AAAI Conference 2025 Conference Paper

Leveraging the Dual Capabilities of LLM: LLM-Enhanced Text Mapping Model for Personality Detection

  • Weihong Bi
  • Feifei Kou
  • Lei Shi
  • Yawen Li
  • Haisheng Li
  • Jinpeng Chen
  • Mingying Xu

Personality detection aims to deduce a user’s personality from their published posts. The goal of this task is to map posts to specific personality types. Existing methods encode post information to obtain user vectors, which are then mapped to personality labels. However, existing methods face two main issues: first, only using small models makes it hard to accurately extract semantic features from multiple long documents. Second, the relationship between user vectors and personality labels is not fully considered. To address the issue of poor user representation, we utilize the text embedding capabilities of LLM. To solve the problem of insufficient consideration of the relationship between user vectors and personality labels, we leverage the text generation capabilities of LLM. Therefore, we propose the LLM-Enhanced Text Mapping Model (ETM) for Personality Detection. The model applies LLM’s text embedding capability to enhance user vector representations. Additionally, it uses LLM’s text generation capability to create multi-perspective interpretations of the labels, which are then used within a contrastive learning framework to strengthen the mapping of these vectors to personality labels. Experimental results show that our model achieves state-of-the-art performance on benchmark datasets.

TIST Journal 2025 Journal Article

Open Spatio-Temporal Foundation Models for Traffic Prediction

  • Zhonghang Li
  • Long Xia
  • Lei Shi
  • Yong Xu
  • Dawei Yin
  • Chao Huang

Accurate traffic forecasting is crucial for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing models often face limitations in generalization, struggling with zero-shot prediction on unseen regions and cities, as well as diminished long-term accuracy. This is primarily due to the inherent challenges in handling the spatial and temporal heterogeneity of traffic data, coupled with the significant distribution shift across time and space. In this work, we aim to unlock new possibilities for building versatile, resilient and adaptive spatio-temporal foundation models for traffic prediction. We introduce OpenCity, a foundation model that captures underlying spatio-temporal patterns from diverse data, facilitating zero-shot generalization across urban environments. OpenCity integrates Transformers with graph neural networks to capture complex spatio-temporal dependencies in traffic data. By pre-training OpenCity on large-scale, heterogeneous traffic data from web platforms, we enable the model to learn rich, generalizable representations that can be seamlessly applied to a wide range of traffic forecasting scenarios. Experiments show OpenCity excels in zero-shot prediction and exhibits scaling laws, highlighting its potential as a universal one-for-all traffic prediction solution adaptable to new urban contexts with minimal overhead. Source codes are available at: https://github.com/HKUDS/OpenCity

NeurIPS Conference 2025 Conference Paper

OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance

  • Yuhan Yao
  • Feifei Kou
  • Lei Shi
  • Xiao Yang
  • Zhongbao Zhang
  • Suguo Zhu
  • Jiwei Zhang
  • Lirong Qiu

The advancements in generative models and the real-world attack of machine-generated text(MGT) create a demand for more robust detection methods. The existing MGT detection methods for adversarial environments primarily consist of manually designed statistical-based methods and fine-tuned classifier-based approaches. Statistical-based methods extract intrinsic features but suffer from rigid decision boundaries vulnerable to adaptive attacks, while fine-tuned classifiers achieve outstanding performance at the cost of overfitting to superficial textual feature. We argue that the key to detection in current adversarial environments lies in how to extract intrinsic invariant features and ensure that the classifier possesses dynamic adaptability. In that case, we propose OSTAR, a novel MGT detection framework designed for adversarial environments which composed of a statistical enhanced classifier and a Multi-Faceted Contrastive Learning(MFCL). In the classifier aspect, our Multi-Dimensional Statistical Profiling (MDSP) module extracts intrinsic difference between human and machine texts, complementing classifiers with useful stable features. In the model optimization aspect, the MFCL strategy enhances robustness by contrasting feature variations before and after text attacks, jointly optimizing statistical feature mapping and baseline pre-trained models. Experimental results on three public datasets under various adversarial scenarios demonstrate that our framework outperforms existing MGT detection methods, achieving state-of-the-art performance and robust against attacks. The code is available at https: //github. com/BUPT-SN/OSTAR.

AAAI Conference 2025 Conference Paper

Radiology Report Generation via Multi-objective Preference Optimization

  • Ting Xiao
  • Lei Shi
  • Peng Liu
  • Zhe Wang
  • Chenjia Bai

Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on supervised regression based on different architectures or additional knowledge injection, while the generated report may not align optimally with radiologists’ preferences. Especially, since the preferences of radiologists are inherently heterogeneous and multi-dimensional, e.g., some may prioritize report fluency, while others emphasize clinical accuracy. To address this problem, we propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with multiple human preferences, which can be formulated by multi-dimensional reward functions and optimized by multi-objective reinforcement learning (RL). Specifically, we use a preference vector to represent the weight of preferences and use it as a condition for the RRG model. Then, a linearly weighed reward is obtained via a dot product between the preference vector and multi-dimensional reward. Next, the RRG model is optimized to align with the preference vector by optimizing such a reward via RL. In the training stage, we randomly sample diverse preference vectors from the preference space and align the model by optimizing the weighted multi-objective rewards, which leads to an optimal policy on the entire preference space. When inference, our model can generate reports aligned with specific preferences without further fine-tuning. Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.

AAAI Conference 2025 Conference Paper

THGNets: Constrained Temporal Hypergraphs and Graph Neural Networks in Hyperbolic Space for Information Diffusion Prediction

  • Yanchao Liu
  • Pengzhou Zhang
  • Wenchao Song
  • Yao Zheng
  • Deyu Li
  • Lei Shi
  • Junpeng Gong

Information diffusion prediction aims to predict the next infected user in the information diffusion, which is a critical task to understand how information spreads on social platforms. Existing methods mainly focus on the sequences or topology structure in euclidean space. However, they fail to sufficiently consider the hierarchical structure or power-law structure of the underlying topology of information cascade graphs and social networks, resulting in distortion of user features. To tackle above issue, we propose an innovative Constrained Temporal Hypergraphs and Graph Neural Networks (THGNets) framework that is tailored for information diffusion prediction. Specifically, we introduce hyperbolic temporal hypergraphs neural network to alleviate the distortion of user features by hyperbolic hierarchical learning in information cascades. Additionally, it also captures high-order dynamic interaction patterns between users and further integrates the time-consistency constraint mechanism to mitigate the instability and non-smoothness of user features in latent space. In parallel, we apply the hyperbolic graph neural network to investigate the hierarchical structure and user homogeneity on social networks, enhancing our understanding of social relationships. Moreover, hyperbolic gated recurrent units are employed to capture the potential dependency relationships between contextual users. Experiments conducted on four public datasets demonstrate that the proposed THGNets significantly outperform the existing methods, thereby validating the superiority and rationality of our approach.

AAAI Conference 2025 Conference Paper

Unaligned Message-Passing and Contextualized-Pretraining for Robust Geo-Entity Resolution

  • Yuwen Ji
  • Wenbo Xie
  • Jiaqi Zhang
  • Chao Wang
  • Ning Guo
  • Lei Shi
  • Yue Zhang

Geo-entity resolution involves linking records that refer to the same entities across different spatial datasets, which underpins location-based services. Given the varying quality of geo-data, this task is known to be challenging, as directly comparing the semantic-centric representations of two entities is no longer reliable. To robustify geo-entity resolution in this context, the main research question is how to effectively extend the current semantics-centric representations of geo-entity with geographical context from its spatial neighbors. Existing methods consider names from neighbors, but they struggle to fully utilize the unaligned neighbor attributes. In this paper, we study the representation of geo-context for robust geo-entity resolution and propose two adaptations that efficiently leverage unaligned geo-entity attributes across spatial neighbors: (1) A plugin module, namely Unaligned Message-Passing (UMP), that propagates unaligned neighbor features to integrate geo-context into the token embeddings output by language model; (2) a contextualized pretraining framework (CP) that allows the former to leverage unlabelled geo-entity data. Experiments show that our method surpasses the baselines, achieving higher F1 scores on 8 real-world geo-datasets in terms of robustness, with an improvement of up to 7.9%. The ablation study further justifies our proposal.

ICML Conference 2025 Conference Paper

Zero-Inflated Bandits

  • Haoyu Wei
  • Runzhe Wan
  • Lei Shi
  • Rui Song 0006

Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.

NeurIPS Conference 2024 Conference Paper

An End-To-End Graph Attention Network Hashing for Cross-Modal Retrieval

  • Huilong Jin
  • Yingxue Zhang
  • Lei Shi
  • Shuang Zhang
  • Feifei Kou
  • Jiapeng Yang
  • Chuangying Zhu
  • Jia Luo

Due to its low storage cost and fast search speed, cross-modal retrieval based on hashing has attracted widespread attention and is widely used in real-world applications of social media search. However, most existing hashing methods are often limited by uncomprehensive feature representations and semantic associations, which greatly restricts their performance and applicability in practical applications. To deal with this challenge, in this paper, we propose an end-to-end graph attention network hashing (EGATH) for cross-modal retrieval, which can not only capture direct semantic associations between images and texts but also match semantic content between different modalities. We adopt the contrastive language image pretraining (CLIP) combined with the Transformer to improve understanding and generalization ability in semantic consistency across different data modalities. The classifier based on graph attention network is applied to obtain predicted labels to enhance cross-modal feature representation. We construct hash codes using an optimization strategy and loss function to preserve the semantic information and compactness of the hash code. Comprehensive experiments on the NUS-WIDE, MIRFlickr25K, and MS-COCO benchmark datasets show that our EGATH significantly outperforms against several state-of-the-art methods.

NeurIPS Conference 2024 Conference Paper

Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification

  • Yuankai Luo
  • Lei Shi
  • Xiao-ming Wu

Graph Transformers (GTs) have recently emerged as popular alternatives to traditional message-passing Graph Neural Networks (GNNs), due to their theoretically superior expressiveness and impressive performance reported on standard node classification benchmarks, often significantly outperforming GNNs. In this paper, we conduct a thorough empirical analysis to reevaluate the performance of three classic GNN models (GCN, GAT, and GraphSAGE) against GTs. Our findings suggest that the previously reported superiority of GTs may have been overstated due to suboptimal hyperparameter configurations in GNNs. Remarkably, with slight hyperparameter tuning, these classic GNN models achieve state-of-the-art performance, matching or even exceeding that of recent GTs across 17 out of the 18 diverse datasets examined. Additionally, we conduct detailed ablation studies to investigate the influence of various GNN configurations—such as normalization, dropout, residual connections, and network depth—on node classification performance. Our study aims to promote a higher standard of empirical rigor in the field of graph machine learning, encouraging more accurate comparisons and evaluations of model capabilities. Our implementation is available at https: //github. com/LUOyk1999/tunedGNN.

JMLR Journal 2024 Journal Article

Classification with Deep Neural Networks and Logistic Loss

  • Zihan Zhang
  • Lei Shi
  • Ding-Xuan Zhou

Deep neural networks (DNNs) trained with the logistic loss (also known as the cross entropy loss) have made impressive advancements in various binary classification tasks. Despite the considerable success in practice, generalization analysis for binary classification with deep neural networks and the logistic loss remains scarce. The unboundedness of the target function for the logistic loss in binary classification is the main obstacle to deriving satisfactory generalization bounds. In this paper, we aim to fill this gap by developing a novel theoretical analysis and using it to establish tight generalization bounds for training fully connected ReLU DNNs with logistic loss in binary classification. Our generalization analysis is based on an elegant oracle-type inequality which enables us to deal with the boundedness restriction of the target function. Using this oracle-type inequality, we establish generalization bounds for fully connected ReLU DNN classifiers $\hat{f}^{\text{FNN}}_n$ trained by empirical logistic risk minimization with respect to i.i.d. samples of size $n$, which lead to sharp rates of convergence as $n\to\infty$. In particular, we obtain optimal convergence rates for $\hat{f}^{\text{FNN}}_n$ (up to some logarithmic factor) only requiring the Hölder smoothness of the conditional class probability $\eta$ of data. Moreover, we consider a compositional assumption that requires $\eta$ to be the composition of several vector-valued multivariate functions of which each component function is either a maximum value function or a Hölder smooth function only depending on a small number of its input variables. Under this assumption, we can even derive optimal convergence rates for $\hat{f}^{\text{FNN}}_n$ (up to some logarithmic factor) which are independent of the input dimension of data. This result explains why in practice DNN classifiers can overcome the curse of dimensionality and perform well in high-dimensional classification problems. Furthermore, we establish dimension-free rates of convergence under other circumstances such as when the decision boundary is piecewise smooth and the input data are bounded away from it. Besides the novel oracle-type inequality, the sharp convergence rates presented in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with deep neural networks. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Enhancing Graph Transformers with Hierarchical Distance Structural Encoding

  • Yuankai Luo
  • Hongkang Li
  • Lei Shi
  • Xiao-ming Wu

Graph transformers need strong inductive biases to derive meaningful attention scores. Yet, current methods often fall short in capturing longer ranges, hierarchical structures, or community structures, which are common in various graphs such as molecules, social networks, and citation networks. This paper presents a Hierarchical Distance Structural Encoding (HDSE) method to model node distances in a graph, focusing on its multi-level, hierarchical nature. We introduce a novel framework to seamlessly integrate HDSE into the attention mechanism of existing graph transformers, allowing for simultaneous application with other positional encodings. To apply graph transformers with HDSE to large-scale graphs, we further propose a high-level HDSE that effectively biases the linear transformers towards graph hierarchies. We theoretically prove the superiority of HDSE in terms of expressivity and generalization. Empirically, we demonstrate that graph transformers with HDSE excel in graph classification, regression on 7 graph-level datasets, and node classification on 11 large-scale graphs.

JMLR Journal 2024 Journal Article

Low-Rank Matrix Estimation in the Presence of Change-Points

  • Lei Shi
  • Guanghui Wang
  • Changliang Zou

We consider a general trace regression model with multiple structural changes and propose a universal approach for simultaneous exact or near-low-rank matrix recovery and change-point detection. It incorporates nuclear norm penalized least-squares minimization into a grid search scheme that determines the potential structural break. Under a set of general conditions, we establish the non-asymptotic error bounds with a nearly-oracle rate for the matrix estimators as well as the super-consistency rate for the change-point localization. We use concrete random design instances to justify the appropriateness of the proposed conditions. Numerical results demonstrate the validity and effectiveness of the proposed scheme. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Neural Reasoning about Agents’ Goals, Preferences, and Actions

  • Matteo Bortoletto
  • Lei Shi
  • Andreas Bulling

We propose the Intuitive Reasoning Network (IRENE) - a novel neural model for intuitive psychological reasoning about agents' goals, preferences, and actions that can generalise previous experiences to new situations. IRENE combines a graph neural network for learning agent and world state representations with a transformer to encode the task context. When evaluated on the challenging Baby Intuitions Benchmark, IRENE achieves new state-of-the-art performance on three out of its five tasks - with up to 48.9% improvement. In contrast to existing methods, IRENE is able to bind preferences to specific agents, to better distinguish between rational and irrational agents, and to better understand the role of blocking obstacles. We also investigate, for the first time, the influence of the training tasks on test performance. Our analyses demonstrate the effectiveness of IRENE in combining prior knowledge gained during training for unseen evaluation tasks.

AAAI Conference 2024 Conference Paper

Self-Supervised Representation Learning with Meta Comprehensive Regularization

  • Huijie Guo
  • Ying Ba
  • Jie Hu
  • Lingyu Si
  • Wenwen Qiang
  • Lei Shi

Self-Supervised Learning (SSL) methods harness the concept of semantic invariance by utilizing data augmentation strategies to produce similar representations for different deformations of the same input. Essentially, the model captures the shared information among multiple augmented views of samples, while disregarding the non-shared information that may be beneficial for downstream tasks. To address this issue, we introduce a module called CompMod with Meta Comprehensive Regularization (MCR), embedded into existing self-supervised frameworks, to make the learned representations more comprehensive. Specifically, we update our proposed model through a bi-level optimization mechanism, enabling it to capture comprehensive features. Additionally, guided by the constrained extraction of features using maximum entropy coding, the self-supervised learning model learns more comprehensive features on top of learning consistent features. In addition, we provide theoretical support for our proposed method from information theory and causal counterfactual perspective. Experimental results show that our method achieves significant improvement in classification, object detection and semantic segmentation tasks on multiple benchmark datasets.

JMLR Journal 2024 Journal Article

Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression

  • Jiading Liu
  • Lei Shi

Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space (RKHS) typically requires the target function to be contained in this kernel space. This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory. We develop an integral operator approach to establish sharp finite sample upper bounds for prediction with divide-and-conquer estimators under various regularity conditions of explanatory variables and target function. We also prove the asymptotic optimality of the derived rates by building the mini-max lower bounds. Finally, we consider the convergence of noiseless estimators and show that the rates can be arbitrarily fast under mild conditions. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Stratified GNN Explanations through Sufficient Expansion

  • Yuwen Ji
  • Lei Shi
  • Zhimeng Liu
  • Ge Wang

Explaining the decisions made by Graph Neural Networks (GNNs) is vital for establishing trust and ensuring fairness in critical applications such as medicine and science. The prevalence of hierarchical structure in real-world graphs/networks raises an important question on GNN interpretability: "On each level of the graph structure, which specific fraction imposes the highest influence over the prediction?" Currently, the prevailing two categories of methods are incapable of achieving multi-level GNN explanation due to their flat or motif-centric nature. In this work, we formulate the problem of learning multi-level explanations out of GNN models and introduce a stratified explainer module, namely STFExplainer, that utilizes the concept of sufficient expansion to generate explanations on each stratum. Specifically, we learn a higher-level subgraph generator by leveraging both hierarchical structure and GNN-encoded input features. Experiment results on both synthetic and real-world datasets demonstrate the superiority of our stratified explainer on standard interpretability tasks and metrics such as fidelity and explanation recall, with an average improvement of 11% and 8% over the best alternative on each data type. The case study on material domains also confirms the value of our approach through detected multi-level graph patterns accurately reconstructing the knowledge-based ground truth.

AAAI Conference 2024 Conference Paper

Structural Information Enhanced Graph Representation for Link Prediction

  • Lei Shi
  • Bin Hu
  • Deng Zhao
  • Jianshan He
  • Zhiqiang Zhang
  • Jun Zhou

Link prediction is a fundamental task of graph machine learning, and Graph Neural Network (GNN) based methods have become the mainstream approach due to their good performance. However, the typical practice learns node representations through neighborhood aggregation, lacking awareness of the structural relationships between target nodes. Recently, some methods have attempted to address this issue by node labeling tricks. However, they still rely on the node-centric neighborhood message passing of GNNs, which we believe involves two limitations in terms of information perception and transmission for link prediction. First, it cannot perceive long-range structural information due to the restricted receptive fields. Second, there may be information loss of node-centric model on link-centric task. In addition, we empirically find that the neighbor node features could introduce noise for link prediction. To address these issues, we propose a structural information enhanced link prediction framework, which involves removing the neighbor node features while fitting neighborhood graph structures more focused through GNN. Furthermore, we introduce Binary Structural Transformer (BST) to encode the structural relationships between target nodes, complementing the deficiency of GNN. Our approach achieves remarkable results on multiple popular benchmarks, including ranking first on ogbl-ppa, ogbl-citation2 and Pubmed.

IJCAI Conference 2024 Conference Paper

Unsupervised Deep Graph Structure and Embedding Learning

  • Xiaobo Shen
  • Lei Shi
  • Xiuwen Gong
  • Shirui Pan

Graph Neural Network (GNN) is powerful in graph embedding learning, but its performance has been shown to be heavily degraded under adversarial attacks. Deep graph structure learning (GSL) is proposed to defend attack by jointly learning graph structure and graph embedding, typically in node classification task. Label supervision is expensive in real-world applications, and thus unsupervised GSL is more challenging and still remains less studied. To fulfill this gap, this paper proposes a new unsupervised GSL method, i. e. , unsupervised property GNN (UPGNN). UPGNN first refines graph structure by exploring properties of low rank, sparsity, feature smoothness. UPGNN employs graph mutual information loss to learn graph embedding by maximizing its correlation with refined graph. The proposed UPGNN learns graph structure and embedding without label supervision, and thus can be applied various downstream tasks. We further propose Accelerated UPGNN (AUPGNN) to reduce computational complexity, providing a efficient alternative to UPGNN. Our extensive experiments on node classification and clustering demonstrate the effectiveness of the proposed method over the state-of-the-arts especially under heavy perturbation.

JBHI Journal 2024 Journal Article

Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI Images

  • Yameng Zhang
  • Yufei Gao
  • Jing Xu
  • Guohua Zhao
  • Lei Shi
  • Lingfei Kong

Recent advances in large model and neuroscience have enabled exploration of the mechanism of brain activity by using neuroimaging data. Brain decoding is one of the most promising researches to further understand the human cognitive function. However, current methods excessively depends on high-quality labeled data, which brings enormous expense of collection and annotation of neural images by experts. Besides, the performance of cross-individual decoding suffers from inconsistency in data distribution caused by individual variation and different collection equipments. To address mentioned above issues, a Join Domain Adapative Decoding (JDAD) framework is proposed for unsupervised decoding specific brain cognitive state related to behavioral task. Based on the volumetric feature extraction from task-based functional Magnetic Resonance Imaging (tfMRI) data, a novel objective loss function is designed by the combination of joint distribution regularizer, which aims to restrict the distance of both the conditional and marginal probability distribution of labeled and unlabeled samples. Experimental results on the public Human Connectome Project (HCP) S1200 dataset show that JDAD achieves superior performance than other prevalent methods, especially for fine-grained task with 11. 5%-21. 6% improvements of decoding accuracy. The learned 3D features are visualized by Grad-CAM to build a combination with brain functional regions, which provides a novel path to learn the function of brain cortex regions related to specific cognitive task in group level.

NeurIPS Conference 2024 Conference Paper

Using Surrogates in Covariate-adjusted Response-adaptive Randomization Experiments with Delayed Outcomes

  • Lei Shi
  • Waverly Wei
  • Jingshen Wang

Covariate-adjusted response-adaptive randomization (CARA) designs are gaining increasing attention. These designs combine the advantages of randomized experiments with the ability to adaptively revise treatment allocations based on data collected across multiple stages, enhancing estimation efficiency. Yet, CARA designs often assume that primary outcomes are immediately observable, which is not the case in many clinical scenarios where there is a delay in observing primary outcomes. This assumption can lead to significant missingness and inefficient estimation of treatment effects. To tackle this practical challenge, we propose a CARA experimental strategy integrating delayed primary outcomes with immediately observed surrogate outcomes. Surrogate outcomes are intermediate clinical outcomes that are predictive or correlated with the primary outcome of interest. Our design goal is to improve the estimation efficiency of the average treatment effect (ATE) of the primary outcome utilizing surrogate outcomes. From a methodological perspective, our approach offers two benefits: First, we accommodate arm and covariates-dependent delay mechanisms without imposing any parametric modeling assumptions on the distribution of outcomes. Second, when primary outcomes are not fully observed, surrogate outcomes can guide the adaptive treatment allocation rule. From a theoretical standpoint, we prove the semiparametric efficiency bound of estimating ATE under delayed primary outcomes while incorporating surrogate outcomes. We show that the ATE estimator under our proposed design strategy attains this semiparametric efficiency bound and achieves asymptotic normality. Through theoretical investigations and a synthetic HIV study, we show that our design is more efficient than the design without incorporating any surrogate information.

ICRA Conference 2023 Conference Paper

Differential Dynamic Programming based Hybrid Manipulation Strategy for Dynamic Grasping

  • Cheng Zhou
  • Yanbo Long
  • Lei Shi
  • Longfei Zhao
  • Yu Zheng 0001

To fully explore the potential of robots for dexterous manipulation, this paper presents a whole dynamic grasping process to achieve fluent grasping of a target object by the robot end-effector. The process starts from the phase of approaching the object over the phases of colliding with the object and letting it roll about the colliding point to the final phase of catching it by the palm or grasping it by the fingers of the end-effector. We derive a unified model for this hybrid dynamic manipulation process embodied as approaching-colliding-rolling-catching/grasping from the spatial vector based articulated body dynamics. Then, the whole process is formulated as a free-terminal constrained multi-phase optimal control problem (OCP). We extend the traditional differential dynamic programming (DDP) to solving this free-terminal OCP, where the backward pass of DDP involves constrained quadratic programming (QP) problems and we solve them by the primal-dual Augmented Lagrangian (PDAL) method. Simulations and real experiments are conducted to show the effectiveness of the proposed method for robotic dynamic grasping.

NeurIPS Conference 2023 Conference Paper

Improving Self-supervised Molecular Representation Learning using Persistent Homology

  • Yuankai Luo
  • Lei Shi
  • Veronika Thost

Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigated recently, most focus on designing views for contrastive learning. In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales. It has several unique features which particularly suit SSL, naturally offering: different views of the data, stability in terms of distance preservation, and the opportunity to flexibly incorporate domain knowledge. We (1) investigate an autoencoder, which shows the general representational power of PH, and (2) propose a contrastive loss that complements existing approaches. We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.

TIST Journal 2023 Journal Article

Mobility Inference on Long-Tailed Sparse Trajectory

  • Lei Shi
  • Yuankai Luo
  • Shuai Ma
  • Hanghang Tong
  • Zhetao Li
  • Xiatian Zhang
  • Zhiguang Shan

Analyzing the urban trajectory in cities has become an important topic in data mining. How can we model the human mobility consisting of stay and travel states from the raw trajectory data? How can we infer these mobility states from a single user’s trajectory information? How can we further generalize the mobility inference to the real-world trajectory data that span multiple users and are sparsely sampled over time? In this article, based on formal and rigid definitions of the stay/travel mobility, we propose a single trajectory inference algorithm that utilizes a generic long-tailed sparsity pattern in the large-scale trajectory data. The algorithm guarantees a 100% precision in the stay/travel inference with a provable lower bound in the recall metric. Furthermore, we design a transformer-like deep learning architecture on the problem of mobility inference from multiple sparse trajectories. Several adaptations from the standard transformer network structure are introduced, including the singleton design to avoid the negative effect of sparse labels in the decoder side, the customized space-time embedding on features of location records, and the mask apparatus at the output side for loss function correction. Evaluations on three trajectory datasets of 40 million urban users validate the performance guarantees of the proposed inference algorithm and demonstrate the superiority of our deep learning model, in comparison to sequence learning methods in the literature. On extremely sparse trajectories, the deep learning model improves from the single trajectory inference algorithm with more than two times of overall and F1 accuracy. The model also generalizes to large-scale trajectory data from different sources with good scalability.

JMLR Journal 2023 Journal Article

Robust High-Dimensional Low-Rank Matrix Estimation: Optimal Rate and Data-Adaptive Tuning

  • Xiaolong Cui
  • Lei Shi
  • Wei Zhong
  • Changliang Zou

The matrix lasso, which minimizes a least-squared loss function with the nuclear-norm regularization, offers a generally applicable paradigm for high-dimensional low-rank matrix estimation, but its efficiency is adversely affected by heavy-tailed distributions. This paper introduces a robust procedure by incorporating a Wilcoxon-type rank-based loss function with the nuclear-norm penalty for a unified high-dimensional low-rank matrix estimation framework. It includes matrix regression, multivariate regression and matrix completion as special examples. This procedure enjoys several appealing features. First, it relaxes the distributional conditions on random errors from sub-exponential or sub-Gaussian to more general distributions and thus it is robust with substantial efficiency gain for heavy-tailed random errors. Second, as the gradient function of the rank-based loss function is completely pivotal, it overcomes the challenge of tuning parameter selection and substantially saves the computation time by using an easily simulated tuning parameter. Third, we theoretically establish non-asymptotic error bounds with a nearly-oracle rate for the new estimator. Numerical results indicate that the new estimator can be highly competitive among existing methods, especially for heavy-tailed or skewed errors. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

ICML Conference 2023 Conference Paper

Statistical Inference on Multi-armed Bandits with Delayed Feedback

  • Lei Shi
  • Jingshen Wang
  • Tianhao Wu 0002

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback. While existing MAB literature often focuses on maximizing the expected cumulative reward outcomes (or, equivalently, regret minimization), few efforts have been devoted to establish valid statistical inference approaches to quantify the uncertainty of learned policies. We attempt to fill this gap by providing a unified statistical inference framework for policy evaluation where a target policy is allowed to differ from the data collecting policy, and our framework allows delay to be associated with the treatment arms. We present an adaptively weighted estimator that on one hand incorporates the arm-dependent delaying mechanism to achieve consistency, and on the other hand mitigates the variance inflation across stages due to vanishing sampling probability. In particular, our estimator does not critically depend on the ability to estimate the unknown delay mechanism. Under appropriate conditions, we prove that our estimator converges to a normal distribution as the number of time points goes to infinity, which provides guarantees for large-sample statistical inference. We illustrate the finite-sample performance of our approach through Monte Carlo experiments.

NeurIPS Conference 2023 Conference Paper

Transformers over Directed Acyclic Graphs

  • Yuankai Luo
  • Veronika Thost
  • Lei Shi

Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.

JBHI Journal 2022 Journal Article

A Graph Convolutional Multiple Instance Learning on a Hypersphere Manifold Approach for Diagnosing Chronic Obstructive Pulmonary Disease in CT Images

  • Ling Chen
  • Qixing Feng
  • Xi Yin
  • Xiangde Min
  • Lei Shi
  • Defu Yang
  • Yen-Wei Chen
  • Daoqiang Zhang

Chronic obstructive pulmonary disease (COPD) is a prevalent chronic disease with high morbidity and mortality. The early diagnosis of COPD is vital for clinical treatment, which helps patients to have a better quality of life. Because COPD can be ascribed to chronic bronchitis and emphysema, lesions in a computed tomography (CT) image can present anywhere inside the lung with different types, shapes and sizes. Multiple instance learning (MIL) is an effective tool for solving COPD discrimination. In this study, a novel graph convolutional MIL with the adaptive additive margin loss (GCMIL-AAMS) approach is proposed to diagnose COPD by CT. Specifically, for those early stage patients, the selected instance-level features can be more discriminative if they were learned by our proposed graph convolution and pooling with self-attention mechanism. The AAMS loss can utilize the information of COPD severity on a hypersphere manifold by adaptively setting the angular margins to improve the performance, as the severity can be quantified as four grades by pulmonary function test. The results show that our proposed GCMIL-AAMS method provides superior discrimination and generalization abilities in COPD discrimination, with areas under a receiver operating characteristic curve (AUCs) of 0. 960 $\pm$ 0. 014 and 0. 862 $\pm$ 0. 010 in the test set and external testing set, respectively, in 5-fold stratified cross validation; moreover, it demonstrates that graph learning is applicable to MIL and suggests that MIL may be adaptable to graph learning.

NeurIPS Conference 2022 Conference Paper

Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

  • Zhuoqing Song
  • Weijian Li
  • Kexin Jin
  • Lei Shi
  • Ming Yan
  • Wotao Yin
  • Kun Yuan

Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the information of the agents is ``mixed'' by communication. We found that popular communication topologies either have large degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and network-size-independent consensus rate which is used to measure the mixing efficiency. In the proposed family, EquiStatic has a degree of $\Theta(\ln(n))$, where $n$ is the network size, and a series of time-varying one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve $n$-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, both theoretically and empirically. Our code is implemented through BlueFog and available at https: //github. com/kexinjinnn/EquiTopo.

JMLR Journal 2021 Journal Article

Generalization Properties of hyper-RKHS and its Applications

  • Fanghui Liu
  • Lei Shi
  • Xiaolin Huang
  • Jie Yang
  • Johan A.K. Suykens

This paper generalizes regularized regression problems in a hyper-reproducing kernel Hilbert space (hyper-RKHS), illustrates its utility for kernel learning and out-of-sample extensions, and proves asymptotic convergence results for the introduced regression models in an approximation theory view. Algorithmically, we consider two regularized regression models with bivariate forms in this space, including kernel ridge regression (KRR) and support vector regression (SVR) endowed with hyper-RKHS, and further combine divide-and-conquer with Nystr\"{o}m approximation for scalability in large sample cases. This framework is general: the underlying kernel is learned from a broad class, and can be positive definite or not, which adapts to various requirements in kernel learning. Theoretically, we study the convergence behavior of regularized regression algorithms in hyper-RKHS and derive the learning rates, which goes beyond the classical analysis on RKHS due to the non-trivial independence of pairwise samples and the characterisation of hyper-RKHS. Experimentally, results on several benchmarks suggest that the employed framework is able to learn a general kernel function form an arbitrary similarity matrix, and thus achieves a satisfactory performance on classification tasks. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

JMLR Journal 2018 Journal Article

Convergence of Unregularized Online Learning Algorithms

  • Yunwen Lei
  • Lei Shi
  • Zheng-Chu Guo

In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess generalization errors in expectation. A sufficient condition for the almost sure convergence is also given. With high probability, we provide explicit convergence rates of the excess generalization errors for both averaged iterates and the last iterate, which in turn also imply convergence rates with probability one. To our best knowledge, this is the first high- probability convergence rate for the last iterate of online gradient descent algorithms in the general convex setting. Without any boundedness assumptions on iterates, our results are derived by a novel use of two measures of the algorithm's one- step progress, respectively by generalization errors and by distances in RKHSs, where the variances of the involved martingales are cancelled out by the descent property of the algorithm. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

IS Journal 2018 Journal Article

Two-Stage Road Terrain Identification Approach for Land Vehicles Using Feature-Based and Markov Random Field Algorithm

  • Shifeng Wang
  • Sarath Kodagoda
  • Lei Shi
  • Xiang Dai

Road terrain identification is one of the important tasks for driving assistant systems or autonomous land vehicles. It plays a key role in improving driving strategy and enhancing fuel efficiency. In this paper, a two-stage approach using multiple sensors is presented. In the first stage, a feature-based identification approach is performed using an accelerometer, a camera, and downward-looking and forward-looking laser range finders (LRFs). This produces four classification label sequences. In the second stage, a majority vote is implemented for each label sequences to match them into synchronized road patches. Then a Markov Random Field (MRF) model is designed to generate the final optimized identification results to improve the forward-looking LRF. This approach enables the vehicle to observe the upcoming road terrain before moving onto it by fusing all the classification results using an MRF algorithm. The experiments show this approach improved the terrain identification accuracy and robustness significantly for some familiar road terrains.

IJCAI Conference 2018 Conference Paper

Uncertainty Sampling for Action Recognition via Maximizing Expected Average Precision

  • Hanmo Wang
  • Xiaojun Chang
  • Lei Shi
  • Yi Yang
  • Yi-Dong Shen

Recognizing human actions in video clips has been an important topic in computer vision. Sufficient labeled data is one of the prerequisites for the good performance of action recognition algorithms. However, while abundant videos can be collected from the Internet, categorizing each video clip is tedious and even time-consuming. Active learning is one way to alleviate the labeling labor by allowing the classifier to choose the most informative unlabeled instances for manual annotation. Among various active learning algorithms, uncertainty sampling is arguably the most widely-used strategy. Conventional uncertainty sampling strategies such as entropy-based methods are usually tested under accuracy. However, in action recognition Average Precision (AP) is an acknowledged evaluation metric, which is somehow ignored in the active learning community. It is defined as the area under the precision-recall curve. In this paper, we propose a novel uncertainty sampling algorithm for action recognition using expected AP. We conduct experiments on three real-world action recognition datasets and show that our algorithm outperforms other uncertainty-based active learning algorithms.

JMLR Journal 2017 Journal Article

Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network

  • Zheng-Chu Guo
  • Lei Shi
  • Qiang Wu

Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the output functions learnt from these blocks. Since the average process will decrease the variance, not the bias, bias correction is expected to improve the learning performance if the base regression algorithm is a biased one. Regularization kernel network is an effective and widely used method for nonlinear regression analysis. In this paper we will investigate a bias corrected version of regularization kernel network. We derive the error bounds when it is applied to a single data set and when it is applied as a base algorithm in distributed regression. We show that, under certain appropriate conditions, the optimal learning rates can be reached in both situations. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

IJCAI Conference 2016 Conference Paper

Diversifying Convex Transductive Experimental Design for Active Learning

  • Lei Shi
  • Yi-Dong Shen

Convex Transductive Experimental Design (CTED) is one of the most representative active learning methods. It utilizes a data reconstruction framework to select informative samples for manual annotation. However, we observe that CTED cannot well handle the diversity of selected samples and hence the set of selected samples may contain mutually similar samples which convey similar or overlapped information. This is definitely undesired. Given limited budget for data labeling, it is desired to select informative samples with complementary information, i. e. , similar samples are excluded. To this end, we proposes Diversified CTED by seamlessly incorporating a novel and effective diversity regularizer into CTED, ensuring the selected samples are diverse. The involvement of the diversity regularizer leads the optimization problem hard to solve. We derive an effective algorithm to solve an equivalent problem which is easier to optimize. Extensive experimental results on several benchmark data sets demonstrate that Diversified CTED significantly improves CTED and consistently outperforms the state-of-the-art methods, verifying the effectiveness and advantages of incorporating the proposed diversity regularizer into CTED.

AAAI Conference 2015 Conference Paper

Convex Batch Mode Active Sampling via α-Relative Pearson Divergence

  • Hanmo Wang
  • Liang Du
  • Peng Zhou
  • Lei Shi
  • Yi-Dong Shen

Active learning is a machine learning technique that trains a classifier after selecting a subset from an unlabeled dataset for labeling and using the selected data for training. Recently, batch mode active learning, which selects a batch of samples to label in parallel, has attracted a lot of attention. Its challenge lies in the choice of criteria used for guiding the search of the optimal batch. In this paper, we propose a novel approach to selecting the optimal batch of queries by minimizing the α-relative Pearson divergence (RPE) between the labeled and the original datasets. This particular divergence is chosen since it can distinguish the optimal batch more easily than other measures especially when available candidates are similar. The proposed objective is a min-max optimization problem, and it is difficult to solve due to the involvement of both minimization and maximization. We find that the objective has an equivalent convex form, and thus a global optimal solution can be obtained. Then the subgradient method can be applied to solve the simplified convex problem. Our empirical studies on UCI datasets demonstrate the effectiveness of the proposed approach compared with the state-of-the-art batch mode active learning methods.

IJCAI Conference 2015 Conference Paper

Learning a Robust Consensus Matrix for Clustering Ensemble via Kullback-Leibler Divergence Minimization

  • Peng Zhou
  • Liang Du
  • Hanmo Wang
  • Lei Shi
  • Yi-Dong Shen

Clustering ensemble has emerged as an important extension of the classical clustering problem. It provides a framework for combining multiple base clusterings of a data set to generate a final consensus result. Most existing clustering methods simply combine clustering results without taking into account the noises, which may degrade the clustering performance. In this paper, we propose a novel robust clustering ensemble method. To improve the robustness, we capture the sparse and symmetric errors and integrate them into our robust and consensus framework to learn a low-rank matrix. Since the optimization of the objective function is difficult to solve, we develop a block coordinate descent algorithm which is theoretically guaranteed to converge. Experimental results on real world data sets demonstrate the effectiveness of our method.

JMLR Journal 2015 Journal Article

Learning with the Maximum Correntropy Criterion Induced Losses for Regression

  • Yunlong Feng
  • Xiaolin Huang
  • Lei Shi
  • Yuning Yang
  • Johan A.K. Suykens

Within the statistical learning framework, this paper studies the regression model associated with the correntropy induced losses. The correntropy, as a similarity measure, has been frequently employed in signal processing and pattern recognition. Motivated by its empirical successes, this paper aims at presenting some theoretical understanding towards the maximum correntropy criterion in regression problems. Our focus in this paper is two-fold: first, we are concerned with the connections between the regression model associated with the correntropy induced loss and the least squares regression model. Second, we study its convergence property. A learning theory analysis which is centered around the above two aspects is conducted. From our analysis, we see that the scale parameter in the loss function balances the convergence rates of the regression model and its robustness. We then make some efforts to sketch a general view on robust loss functions when being applied into the learning for regression problems. Numerical experiments are also implemented to verify the effectiveness of the model. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

IJCAI Conference 2015 Conference Paper

Recovery of Corrupted Multiple Kernels for Clustering

  • Peng Zhou
  • Liang Du
  • Lei Shi
  • Hanmo Wang
  • Yi-Dong Shen

Kernel-based methods, such as kernel k-means and kernel PCA, have been widely used in machine learning tasks. The performance of these methods critically depends on the selection of kernel functions; however, the challenge is that we usually do not know what kind of kernels is suitable for the given data and task in advance; this leads to research on multiple kernel learning, i. e. we learn a consensus kernel from multiple candidate kernels. Existing multiple kernel learning methods have difficulty in dealing with noises. In this paper, we propose a novel method for learning a robust yet lowrank kernel for clustering tasks. We observe that the noises of each kernel have specific structures, so we can make full use of them to clean multiple input kernels and then aggregate them into a robust, low-rank consensus kernel. The underlying optimization problem is hard to solve and we will show that it can be solved via alternating minimization, whose convergence is theoretically guaranteed. Experimental results on several benchmark data sets further demonstrate the effectiveness of our method.

IJCAI Conference 2015 Conference Paper

Robust Multiple Kernel K-means Using L21-Norm

  • Liang Du
  • Peng Zhou
  • Lei Shi
  • Hanmo Wang
  • Mingyu Fan
  • Wenjian Wang
  • Yi-Dong Shen

The k-means algorithm is one of the most often used method for data clustering. However, the standard k-means can only be applied in the original feature space. The kernel k-means, which extends k-means into the kernel space, can be used to capture the non-linear structure and identify arbitrarily shaped clusters. Since both the standard k-means and kernel k-means apply the squared error to measure the distances between data points and cluster centers, a few outliers will cause large errors and dominate the objection function. Besides, the performance of kernel method is largely determined by the choice of kernel. Unfortunately, the most suitable kernel for a particular task is often unknown in advance. In this paper, we first present a robust kmeans using `2, 1-norm in the feature space and then extend it to the kernel space. To recap the powerfulness of kernel methods, we further propose a novel robust multiple kernel k-means (RMKKM) algorithm that simultaneously finds the best clustering label, the cluster membership and the optimal combination of multiple kernels. An alternating iterative schema is developed to find the optimal value. Extensive experiments well demonstrate the effectiveness of the proposed algorithms.

JMLR Journal 2014 Journal Article

Ramp Loss Linear Programming Support Vector Machine

  • Xiaolin Huang
  • Lei Shi
  • Johan A.K. Suykens

The ramp loss is a robust but non-convex loss for classification. Compared with other non-convex losses, a local minimum of the ramp loss can be effectively found. The effectiveness of local search comes from the piecewise linearity of the ramp loss. Motivated by the fact that the $\ell_1$-penalty is piecewise linear as well, the $\ell_1$-penalty is applied for the ramp loss, resulting in a ramp loss linear programming support vector machine (ramp- LPSVM). The proposed ramp-LPSVM is a piecewise linear minimization problem and the related optimization techniques are applicable. Moreover, the $\ell_1$-penalty can enhance the sparsity. In this paper, the corresponding misclassification error and convergence behavior are discussed. Generally, the ramp loss is a truncated hinge loss. Therefore ramp-LPSVM possesses some similar properties as hinge loss SVMs. A local minimization algorithm and a global search strategy are discussed. The good optimization capability of the proposed algorithms makes ramp-LPSVM perform well in numerical experiments: the result of ramp-LPSVM is more robust than that of hinge SVMs and is sparser than that of ramp-SVM, which consists of the $\|\cdot\|_{\mathcal{K}} $-penalty and the ramp loss. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

NeurIPS Conference 2013 Conference Paper

Sparse Additive Text Models with Low Rank Background

  • Lei Shi

The sparse additive model for text modeling involves the sum-of-exp computing, with consuming costs for large scales. Moreover, the assumption of equal background across all classes/topics may be too strong. This paper extends to propose sparse additive model with low rank background (SAM-LRB), and simple yet efficient estimation. Particularly, by employing a double majorization bound, we approximate the log-likelihood into a quadratic lower-bound with the sum-of-exp terms absent. The constraints of low rank and sparsity are then simply embodied by nuclear norm and $\ell_1$-norm regularizers. Interestingly, we find that the optimization task in this manner can be transformed into the same form as that in Robust PCA. Consequently, parameters of supervised SAM-LRB can be efficiently learned using an existing algorithm for Robust PCA based on accelerated proximal gradient. Besides the supervised case, we extend SAM-LRB to also favor unsupervised and multifaceted scenarios. Experiments on real world data demonstrate the effectiveness and efficiency of SAM-LRB, showing state-of-the-art performances.

NeurIPS Conference 2012 Conference Paper

Bayesian Probabilistic Co-Subspace Addition

  • Lei Shi

For modeling data matrices, this paper introduces Probabilistic Co-Subspace Addition (PCSA) model by simultaneously capturing the dependent structures among both rows and columns. Briefly, PCSA assumes that each entry of a matrix is generated by the additive combination of the linear mappings of two features, which distribute in the row-wise and column-wise latent subspaces. Consequently, it captures the dependencies among entries intricately, and is able to model the non-Gaussian and heteroscedastic density. Variational inference is proposed on PCSA for approximate Bayesian learning, where the updating for posteriors is formulated into the problem of solving Sylvester equations. Furthermore, PCSA is extended to tackling and filling missing values, to adapting its sparseness, and to modelling tensor data. In comparison with several state-of-art approaches, experiments demonstrate the effectiveness and efficiency of Bayesian (sparse) PCSA on modeling matrix (tensor) data and filling missing values.

NeurIPS Conference 2009 Conference Paper

Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

  • Lei Shi
  • Thomas Griffiths

The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and oblique effect.