Arrow Research search

Author name cluster

Feilong Cao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Heterophily-aware Contrastive Learning for Heterophilic Hypergraphs

  • Ming Li
  • Yongqi Li
  • Yuting Chen
  • Feilong Cao
  • Ke Lv

Hypergraph neural networks (HNNs) have emerged as powerful tools for modeling high-order relationships in complex systems. However, most existing HNNs are designed under the assumption of homophily, which does not hold in many real-world scenarios where connected nodes often exhibit diverse semantics, i.e., heterophily. This inconsistency leads to suboptimal aggregation and degraded performance, especially in low-label regimes. While a few recent methods have attempted to enhance heterophilic hypergraph learning, they often rely heavily on label supervision and overlook the potential of self-supervised techniques. In this paper, we propose HeroCL, a heterophily-aware contrastive learning framework that improves hypergraph representation under both structural heterogeneity and label scarcity. Specifically, HeroCL integrates a multi-hop neighbor encoding module to capture informative higher-order context and incorporates two complementary contrastive objectives, label-aware and structure-aware, to guide representation learning from both semantic and relational perspectives. A multi-granularity contrastive strategy is introduced to exploit latent signals across multiple neighborhood levels. Extensive experiments on several benchmark datasets against 11 existing baselines demonstrate that HeroCL achieves consistent and significant performance gains, particularly under strong heterophily and limited supervision, validating its robustness and effectiveness.

AAAI Conference 2026 Conference Paper

HyperAim: Hypergraph Contrastive Learning with Adaptive Multi-frequency Filters

  • Ming Li
  • Ruiting Zhao
  • Zihao Yan
  • Lu Bai
  • Lixin Cui
  • Feilong Cao

Unsupervised hypergraph representation learning has recently gained traction for its ability to model complex high-order interactions without requiring labeled data. However, existing contrastive learning methods typically overlook the frequency diversity inherent in hypergraph signals. To address this issue, we propose HyperAim, a contrastive learning framework that integrates adaptive multi-frequency filtering through both decoupled and coupled designs. Specifically, HyperAim employs two decoupled channels with polynomial low-pass and high-pass filters to separately capture distinct frequency components, and a third channel based on framelet decomposition that adaptively fuses multi-frequency signals in a coupled manner. A frequency-aware contrastive learning strategy is introduced to align representations across views using a combination of InfoNCE loss and pseudo-label-guided supervision. Extensive experiments across 12 benchmark datasets, covering both homophilic and heterophilic hypergraphs, demonstrate the consistent superiority of HyperAim over 17 baselines. Ablation studies further confirm the benefits of explicitly modeling and aligning frequency-specific representations.

AAAI Conference 2026 Conference Paper

HyperNoRA: Hyperedge Prediction via Node-Level Relation-Aware Self-Supervised Hypergraph Learning

  • Ming Li
  • Zhanle Zhu
  • Xinyi Li
  • Lu Bai
  • Lixin Cui
  • Feilong Cao
  • Ke Lv

Hyperedge prediction plays a critical role in high-order relational modeling with hypergraphs, yet most existing methods primarily focus on sampling strategies or local aggregation within candidate hyperedges. These approaches often overlook global structural dependencies that are essential for learning expressive node and hyperedge representations. In this paper, we propose HyperNoRA, a novel self-supervised hypergraph learning framework that integrates global node-level relation awareness with contrastive learning. Specifically, we construct a global node relation graph that captures both direct and indirect structural correlations, which guides a structure-aware aggregator to enhance node representations with informative global context. To prevent over-smoothing and maintain discriminability, a contrastive learning module is introduced to align representations across graph augmentations while separating semantically dissimilar nodes. Extensive experiments on several benchmark datasets demonstrate that HyperNoRA consistently outperforms state-of-the-art baselines, and ablation studies verify the effectiveness of its key components.

AAAI Conference 2026 Conference Paper

Multi-Granular Graph Learning with Fine-Grained Behavioral Pattern Awareness for Session-Based Recommendation

  • Ming Li
  • Zihao Yan
  • Yuting Chen
  • Lixin Cui
  • Lu Bai
  • Feilong Cao
  • Ke Lv
  • Zhao Li

Session-based recommendation aims to predict users’ next actions by modeling their ongoing interaction sequences, particularly in scenarios where long-term user profiles are unavailable. While existing methods have achieved promising results by leveraging sequential and graph-based structures, they often rely on global aggregation strategies that emphasize dominant user interests while overlooking the transient and fine-grained behavior patterns embedded in sessions. In practice, user intent evolves across sessions and is reflected through diverse behavioral patterns, ranging from immediate preferences to segmented co-occurrence interests and long-range goals. To address these limitations, we propose GraphFine, a novel multi-granular graph learning framework that achieves fine-grained behavioral pattern awareness for session-based recommendation. Our approach models user behavior at different temporal and semantic granularities through a combination of graph and hypergraph neural networks. Specifically, we employ a position-aware graph to capture short-term item transitions, and construct segmented co-occurrence hypergraphs to uncover high-order semantic relations among co-occurred items. To preserve diverse user intents, we further introduce a multi-view intent readout mechanism that extracts and adaptively integrates intent signals from short-term actions, segmented co-occurrence patterns, and entire sessions. Extensive experiments on benchmark datasets demonstrate that GraphFine consistently outperforms existing state-of-the-art methods, confirming its effectiveness in capturing fine-grained and dynamic user preferences for more accurate recommendation.

AAAI Conference 2026 Conference Paper

Point Cloud Semantic Scene Completion with Prototype-Guided Transformer

  • Chenghao Fang
  • Jianqing Liang
  • Jiye Liang
  • Zijin Du
  • Feilong Cao

Semantic scene completion simultaneously reconstructs the shapes of missing regions and predicts semantic labels for the entire 3D scene. Although point cloud-based methods are more efficient than voxel-based methods, existing point cloud-based approaches largely fail to fully leverage semantic information. To address this challenge, we propose a Prototype-Guided Transformer (ProtoFormer) that encodes semantic information into a set of semantic prototypes to guide the underlying Transformer for semantic scene completion. Specifically, we leverage semantic prototypes to enhance information from both geometric and semantic perspectives, and integrate the top-K attention mechanisms to guide scene completion and semantic awareness. Extensive qualitative and quantitative experimental results demonstrate that ProtoFormer outperforms state-of-the-art approaches with low complexity.

AAAI Conference 2026 Conference Paper

Self-Supervised Hypergraph Learning with Substructure Awareness for Hyperedge Prediction

  • Ming Li
  • Huiting Wang
  • Yuting Chen
  • Lu Bai
  • Lixin Cui
  • Feilong Cao
  • Ke Lv

Hyperedge prediction plays a central role in hypergraph learning, enabling the inference of high-order relations among multiple entities. However, existing methods often rely on a simplistic flat set assumption, treating candidate hyperedges as unstructured collections of nodes and neglecting their potential internal compositionality. Furthermore, the severe scarcity of observed hyperedges poses a challenge for effective supervision. In this work, we propose S3Hyper, a Substructure-contextualized Self-Supervised framework for Hyperedge prediction, which jointly addresses these two challenges. Specifically, we design a substructure-contextualized hyperedge aggregator that models the internal hierarchy of candidate hyperedges by leveraging sub-hyperedge information. In parallel, we introduce an adaptive tri-directional contrastive learning module that incorporates node-level, hyperedge-level, and cross-level alignment objectives, supported by temperature-adaptive mechanisms. Experimental results on four public datasets demonstrate that S3Hyper consistently outperforms strong baselines, with ablation studies verifying the effectiveness of each component.

ICML Conference 2025 Conference Paper

EduLLM: Leveraging Large Language Models and Framelet-Based Signed Hypergraph Neural Networks for Student Performance Prediction

  • Ming Li 0065
  • Yukang Cheng
  • Lu Bai 0001
  • Feilong Cao
  • Ke Lv 0002
  • Jiye Liang
  • Pietro Liò

The growing demand for personalized learning underscores the importance of accurately predicting students’ future performance to support tailored education and optimize instructional strategies. Traditional approaches predominantly focus on temporal modeling using historical response records and learning trajectories. While effective, these methods often fall short in capturing the intricate interactions between students and learning content, as well as the subtle semantics of these interactions. To address these gaps, we present EduLLM, the first framework to leverage large language models in combination with hypergraph learning for student performance prediction. The framework incorporates FraS-HNN ($\underline{\mbox{Fra}}$melet-based $\underline{\mbox{S}}$igned $\underline{\mbox{H}}$ypergraph $\underline{\mbox{N}}$eural $\underline{\mbox{N}}$etworks), a novel spectral-based model for signed hypergraph learning, designed to model interactions between students and multiple-choice questions. In this setup, students and questions are represented as nodes, while response records are encoded as positive and negative signed hyperedges, effectively capturing both structural and semantic intricacies of personalized learning behaviors. FraS-HNN employs framelet-based low-pass and high-pass filters to extract multi-frequency features. EduLLM integrates fine-grained semantic features derived from LLMs, synergizing with signed hypergraph representations to enhance prediction accuracy. Extensive experiments conducted on multiple educational datasets demonstrate that EduLLM significantly outperforms state-of-the-art baselines, validating the novel integration of LLMs with FraS-HNN for signed hypergraph learning.

NeurIPS Conference 2025 Conference Paper

HyperMixup: Hypergraph-Augmented with Higher-order Information Mixup

  • Kaixuan Yao
  • Zhuo Li
  • Jianqing Liang
  • Jiye Liang
  • Ming Li
  • Feilong Cao

Hypergraphs offer a natural paradigm for modeling complex systems with multi-way interactions. Hypergraph neural networks (HGNNs) have demonstrated remarkable success in learning from such higher-order relational data. While such higher-order modeling enhances relational reasoning, the effectiveness of hypergraph learning remains bottlenecked by two persistent challenges: the scarcity of labeled data inherent to complex systems, and the vulnerability to structural noise in real-world interaction patterns. Traditional data augmentation methods, though successful in Euclidean and graph-structured domains, struggle to preserve the intricate balance between node features and hyperedge semantics, often disrupting the very group-wise interactions that define hypergraph value. To bridge this gap, we present HyperMixup, a hypergraph-aware augmentation framework that preserves higher-order interaction patterns through structure-guided feature mixing. Specifically, HyperMixup contains three critical components: 1) Structure-aware node pairing guided by joint feature-hyperedge similarity metrics, 2) Context-enhanced hierarchical mixing that preserves hyperedge semantics through dual-level feature fusion, and 3) Adaptive topology reconstruction mechanisms that maintain hypergraph consistency while enabling controlled diversity expansion. Theoretically, we establish that our method induces hypergraph-specific regularization effects through gradient alignment with hyperedge covariance structures, while providing robustness guarantees against combined node-hyperedge perturbations. Comprehensive experiments across diverse hypergraph learning tasks demonstrate consistent performance improvements over state-of-the-art baselines, with particular effectiveness in low-label regimes. The proposed framework advances hypergraph representation learning by unifying data augmentation with higher-order topological constraints, offering both practical utility and theoretical insights for relational machine learning.

IJCAI Conference 2025 Conference Paper

MATCH: Modality-Calibrated Hypergraph Fusion Network for Conversational Emotion Recognition

  • Jiandong Shi
  • Ming Li
  • Lu Bai
  • Feilong Cao
  • Ke Lu
  • Jiye Liang

Multimodal emotion recognition aims to identify emotions by integrating multimodal features derived from spoken utterances. However, existing work often neglects the calibration of conversational entities, focusing mainly on extracting potential intra- or cross-modal information. This leads to the underutilization of utterance information that is essential for accurately characterizing emotion. Additionally, the lack of effective modeling of conversational patterns limits the ability to capture emotional pathways across contexts, modalities and speakers, impacting the overall emotional understanding. In this study, we propose the modality-calibrated hypergraph fusion network (MATCH), which leverages multimodal fusion and hypergraph learning techniques to address these challenges. In particular, we introduce an entity calibration strategy that refines the representations of conversational entities both at the modality and context levels, allowing for deeper insights into emotion-related cues. Furthermore, we present an emotion-aligned hypergraph fusion method that incorporates a line graph to explore conversational patterns, facilitating flexible knowledge transfer across modalities through hyperedge-level and graph-level alignments. Experiments demonstrate that MATCH outperforms state-of-the-art approaches on two benchmark datasets.

IJCAI Conference 2025 Conference Paper

Multi-Modal Point Cloud Completion with Interleaved Attention Enhanced Transformer

  • Chenghao Fang
  • Jianqing Liang
  • Jiye Liang
  • Hangkun Wang
  • Kaixuan Yao
  • Feilong Cao

Multi-modal point cloud completion, which utilizes a complete image and a partial point cloud as input, is a crucial task in 3D computer vision. Previous methods commonly employ a cross-attention mechanism to fuse point clouds and images. However, these approaches often fail to fully leverage image information and overlook the intrinsic geometric details of point clouds that could complement the image modality. To address these challenges, we propose an interleaved attention enhanced Transformer (IAET) with three main components, i. e. , token embedding, bidirectional token supplement, and coarse-to-fine decoding. IAET incorporates a novel interleaved attention mechanism to enable bidirectional information supplementation between the point cloud and image modalities. Additionally, to maximize the use of the supplemented image information, we introduce a view-guided upsampling module that leverages image tokens as queries to guide the generation of detailed point cloud structures. Extensive experiments demonstrate the effectiveness of IAET, highlighting its state-of-the-art performance on multi-modal point cloud completion benchmarks in various scenarios. The source code is freely accessible at https: //github. com/doldolOuO/IAET.

JBHI Journal 2024 Journal Article

Label-Decoupled Medical Image Segmentation With Spatial-Channel Graph Convolution and Dual Attention Enhancement

  • Qingting Jiang
  • Hailiang Ye
  • Bing Yang
  • Feilong Cao

Deep learning-based methods have been widely used in medical image segmentation recently. However, existing works are usually difficult to simultaneously capture global long-range information from images and topological correlations among feature maps. Further, medical images often suffer from blurred target edges. Accordingly, this paper proposes a novel medical image segmentation framework named a label-decoupled network with spatial-channel graph convolution and dual attention enhancement mechanism (LADENet for short). It constructs learnable adjacency matrices and utilizes graph convolutions to effectively capture global long-range information on spatial locations and topological dependencies between different channels in an image. Then a label-decoupled strategy based on distance transformation is introduced to decouple an original segmentation label into a body label and an edge label for supervising the body branch and edge branch. Again, a dual attention enhancement mechanism, designing a body attention block in the body branch and an edge attention block in the edge branch, is built to promote the learning ability of spatial region and boundary features. Besides, a feature interactor is devised to fully consider the information interaction between the body and edge branches to improve segmentation performance. Experiments on benchmark datasets reveal the superiority of LADENet compared to state-of-the-art approaches.

ICML Conference 2023 Conference Paper

How Powerful are Shallow Neural Networks with Bandlimited Random Weights?

  • Ming Li 0065
  • Sho Sonoda
  • Feilong Cao
  • Yu Guang Wang 0001
  • Jiye Liang

We investigate the expressive power of depth-2 bandlimited random neural networks. A random net is a neural network where the hidden layer parameters are frozen with random assignment, and only the output layer parameters are trained by loss minimization. Using random weights for a hidden layer is an effective method to avoid non-convex optimization in standard gradient descent learning. It has also been adopted in recent deep learning theories. Despite the well-known fact that a neural network is a universal approximator, in this study, we mathematically show that when hidden parameters are distributed in a bounded domain, the network may not achieve zero approximation error. In particular, we derive a new nontrivial approximation error lower bound. The proof utilizes the technique of ridgelet analysis, a harmonic analysis method designed for neural networks. This method is inspired by fundamental principles in classical signal processing, specifically the idea that signals with limited bandwidth may not always be able to perfectly reconstruct the original signal. We corroborate our theoretical results with various simulation studies, and generally, two main take-home messages are offered: (i) Not any distribution for selecting random weights is feasible to build a universal approximator; (ii) A suitable assignment of random weights exists but to some degree is associated with the complexity of the target function.

JBHI Journal 2017 Journal Article

Segmentation of White Blood Cells Image Using Adaptive Location and Iteration

  • Yuehua Liu
  • Feilong Cao
  • Jianwei Zhao
  • Jianjun Chu

Segmentation of white blood cells (WBCs) image is meaningful but challenging due to the complex internal characteristics of the cells and external factors, such as illumination and different microscopic views. This paper addresses two problems of the segmentation: WBC location and subimage segmentation. To locate WBCs, a method that uses multiple windows obtained by scoring multiscale cues to extract a rectangular region is proposed. In this manner, the location window not only covers the whole WBC completely, but also achieves adaptive adjustment. In the subimage segmentation, the subimages preprocessed from the location window with a replace procedure are taken as initialization, and the GrabCut algorithm based on dilation is iteratively run to obtain more precise results. The proposed algorithm is extensively evaluated using a CellaVision dataset as well as a more challenging Jiashan dataset. Compared with the existing methods, the proposed algorithm is not only concise, but also can produce high-quality segmentations. The results demonstrate that the proposed algorithm consistently outperforms other location and segmentation methods, yielding higher recall and better precision rates.