Arrow Research search

Author name cluster

Cai Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

AAAI Conference 2026 Conference Paper

GUIDER: Uncertainty Guided Dynamic Re-ranking for Large Language Models Based Recommender Systems

  • Cai Xu
  • Xujing Wang
  • Ziyu Guan
  • Wei Zhao
  • Meng Yan

Large Language Models (LLMs) are increasingly integral to recommendation systems, offering sophisticated language understanding and generation capabilities. However, their practical application is often hindered by challenges such as data sparsity, the generation of unreliable or hallucinated recommendations, and a general lack of transparency in their decision-making processes. Existing mitigation strategies frequently introduce significant complexity or computational overhead. To address these limitations, particularly the critical gap in quantifying the confidence of LLM-generated recommendations, we propose GUIDER: Uncertainty Guided Dynamic Re-ranking for Large Language Models based Recommender Systems. This new framework innovatively leverages the logits produced by LLMs as evidence for recommended items. By employing a Dirichlet distribution, GUIDER decomposes the total predictive uncertainty into distinct Data Uncertainty (DU), reflecting inherent data ambiguity, and Model Uncertainty (MU), indicating the model's own conviction. This principled decomposition, achieved with a single inference pass, enhances transparency and trustworthiness. Based on the quantified DU and MU levels, our system dynamically adapts its recommendation strategy---adjusting output diversity---through a four-quadrant analysis that tailors responses to specific uncertainty profiles. Extensive experiments conducted in zero-shot recommendation settings validate the effectiveness of our approach. GUIDER consistently outperforms existing methods in reliability-aware scenarios, demonstrably improving recommendation quality. This framework not only advances the practical deployment of LLM-based recommenders by making them more dependable but also provides a robust foundation for future research into uncertainty-aware generative systems.

AAAI Conference 2026 Conference Paper

RSA-CR: Resisting Shilling Attacks in Citation Recommendation via Dumbbell Inductive Learning

  • Xiyue Gao
  • Yukai Liu
  • Zhuoqi Ma
  • Xiaotian Qiao
  • Hui Li
  • Cai Xu
  • Kunhua Zhang
  • Jiangtao Cui

Citation recommendation aims to provide researchers with the most relevant references for their manuscripts, helping them swiftly discover pertinent studies and bolster the reliability of their arguments. However, some individuals manipulate these recommendation systems by injecting false information, such as deliberately inflating the citation count of their own papers, to obtain favorable recommendations and ratings. This form of attack, commonly termed “shilling attack”, is not only highly concealed but also has an unimaginable impact on all scientific research. To address this problem, we theoretically reveal the impact of shilling attacks on citation recommendation and propose three feasible resistance strategies: historical collaborations, significant citations and content constraints. Based on these insights, we introduce RSA-CR, a robust and hybrid citation recommendation algorithm resistant to shilling attacks. The algorithm constructs a two-layer academic graph and uses random and content generation strategies to initialize author and paper embeddings. Confidence-guided inductive aggregations based on collaboration and citation relationships are then performed at the author and paper sides, where author aggregation results directly influences the paper aggregation strength. Finally, recommendations are made by measuring the distances between the fused paper embeddings. The entire learning process resembles a dumbbell, hence termed “dumbbell inductive learning”. Experiments on four academic datasets demonstrate that our method outperforms baselines in both effectiveness and robustness.

AAAI Conference 2026 Conference Paper

Universal EEG Epilepsy Detection via Evidential Multi-View De-Biasing

  • Ziqi Wen
  • Cai Xu
  • Wanqing Zhao
  • Jie Zhao
  • Wei Zhao

Epilepsy is a widespread neurological disorder characterized by highly patient-specific EEG patterns. Existing EEG-based seizure detection methods either train individualized models for each patient or adapt models pre-trained on known patients to new ones. However, when encountering previously unseen patients, these methods typically require retraining or fine-tuning, which limits their practical utility in clinical settings. This limitation can be linked to biases caused by patient-specific variations, which obscure the underlying pathological patterns of seizures. To address this, we propose an evidential multi-view framework that reinforces the learning of core epileptic features by promoting consistency across multiple views and reducing reliance on high-uncertainty, patient-specific segments. Specifically, we introduce Bias-guided Fisher-Evidential Multi-View Learning (BF-EML) to guide the model toward discovering intrinsic seizure patterns. BF-EML employs a two-stage training architecture: In Stage 1, we use the Fisher Information Matrix to reorder EEG segments by uncertainty and deliberately train a biased feature generator on low-evidence segments. In Stage 2, we design a dual-branch network where the biased and unbiased branches are alternately trained, encouraging the unbiased branch to reduce its reliance on patient-specific biases. Finally, we introduce a shift-calibrated fusion strategy to enhance the consistency of pathogenic feature integration. Extensive experiments on public datasets and a clinical dataset demonstrate that our method achieves superior performance in both single- and multi-patient scenarios. Importantly, it generalizes well to unseen patients without the need for retraining.

AAAI Conference 2025 Conference Paper

Biased Incomplete Multi-View Learning

  • Haishun Chen
  • Cai Xu
  • Ziyu Guan
  • Wei Zhao
  • Jinlong Liu

Considering the ubiquitous phenomenon of missing views in multi-view data, incomplete multi-view learning is a crucial task in many applications. Existing methods usually follow an impute-then-predict strategy for handling this problem. However, they often assume that the view-missing patterns are uniformly random in multi-view data, which does not agree with real-world scenarios. In practice, view-missing patterns often vary across different classes. For example, in the medical field, patients with rare diseases would take more examinations than those with common diseases; in the financial field, high-risk customers tend to receive evaluations from more views than ordinary ones. Hence, we often observe that data-rich classes suffer limited views while data-poor classes suffer limited samples. Previous methods would typically fail due to such biased view-missing patterns. This motivates us to delve into a new biased incomplete multi-view learning problem. To this end, we develop a Reliable Incomplete Multi-view Learning (RIML) method. RIML is a simple yet effective learning-free imputation framework that goes beyond the conventional approaches by considering information from all classes, rather than just relying on individual views or within-class samples. Specifically, we utilize an inter-class association matrix that allows data-poor classes to refer the knowledge from data-rich classes. This enables the construction of more reliable view-specific distributions, from which we perform multiple samplings to recover missing views. Additionally, to obtain a reliable multi-view representation for downstream tasks, we develop an enhanced focal loss with a category-aware marginal term to learn a more distinguishable feature space. Experiments on five multi-view datasets demonstrate that RIML significantly outperforms existing methods in both accuracy and robustness.

AAAI Conference 2024 Conference Paper

CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition

  • Linhao Zhang
  • Li Jin
  • Guangluan Xu
  • Xiaoyu Li
  • Cai Xu
  • Kaiwen Wei
  • Nayu Liu
  • Haonan Liu

Understanding the emotional polarity of multimodal content with metaphorical characteristics, such as memes, poses a significant challenge in Multimodal Emotion Recognition (MER). Previous MER researches have overlooked the phenomenon of metaphorical alignment in multimedia content, which involves non-literal associations between concepts to convey implicit emotional tones. Metaphor-agnostic MER methods may be misinformed by the isolated unimodal emotions, which are distinct from the real emotions blended in multimodal metaphors. Moreover, contextual semantics can further affect the emotions associated with similar metaphors, leading to the challenge of maintaining contextual compatibility. To address the issue of metaphorical alignment in MER, we propose to leverage a conditional generative approach for capturing metaphorical analogies. Our approach formulates schematic prompts and corresponding references based on theoretical foundations, which allows the model to better grasp metaphorical nuances. In order to maintain contextual sensitivity, we incorporate a disentangled contrastive matching mechanism, which undergoes curricular adjustment to regulate its intensity during the learning process. The automatic and human evaluation experiments on two benchmarks prove that, our model provides considerable and stable improvements in recognizing multimodal emotion with metaphor attributes.

AAAI Conference 2024 Conference Paper

Entropy Induced Pruning Framework for Convolutional Neural Networks

  • Yiheng Lu
  • Ziyu Guan
  • Yaming Yang
  • Wei Zhao
  • Maoguo Gong
  • Cai Xu

Structured pruning techniques have achieved great compression performance on convolutional neural networks for image classification tasks. However, the majority of existing methods are sensitive with respect to the model parameters, and their pruning results may be unsatisfactory when the original model is trained poorly. That is, they need the original model to be fully trained, to obtain useful weight information. This is time-consuming, and makes the effectiveness of the pruning results dependent on the degree of model optimization. To address the above issue, we propose a novel metric named Average Filter Information Entropy (AFIE). It decomposes the weight matrix of each layer into a low-rank space, and quantifies the filter importance based on the distribution of the normalized eigenvalues. Intuitively, the eigenvalues capture the covariance among filters, and therefore could be a good guide for pruning. Since the distribution of eigenvalues is robust to the updating of parameters, AFIE can yield a stable evaluation for the importance of each filter no matter whether the original model is trained fully. We implement our AFIE-based pruning method for three popular CNN models of AlexNet, VGG-16, and ResNet-50, and test them on three widely-used image datasets MNIST, CIFAR-10, and ImageNet, respectively. The experimental results are encouraging. We surprisingly observe that for our methods, even when the original model is trained with only one epoch, the AFIE score of each filter keeps identical to the results when the model is fully-trained. This fully indicates the effectiveness of the proposed pruning method.

AAAI Conference 2024 Conference Paper

M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking

  • Jiaming Liu
  • Yue Wu
  • Maoguo Gong
  • Qiguang Miao
  • Wenping Ma
  • Cai Xu
  • Can Qin

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple receptive fields (continuous contexts), and multiple solution spaces (distinct tasks) in ONE model. Remarkably, M3SOT pioneers in modeling temporality, contexts, and tasks directly from point clouds, revisiting a perspective on the key factors influencing SOT. To this end, we design a transformer-based network centered on point cloud targets in the search area, aggregating diverse contextual representations and propagating target cues by employing historical frames. As M3SOT spans varied processing perspectives, we've streamlined the network—trimming its depth and optimizing its structure—to ensure a lightweight and efficient deployment for SOT applications. We posit that, backed by practical construction, M3SOT sidesteps the need for complex frameworks and auxiliary components to deliver sterling results. Extensive experiments on benchmarks such as KITTI, nuScenes, and Waymo Open Dataset demonstrate that M3SOT achieves state-of-the-art performance at 38 FPS. Our code and models are available at https://github.com/ywu0912/TeamCode.git.

AAAI Conference 2024 Conference Paper

Reliable Conflictive Multi-View Learning

  • Cai Xu
  • Jiajun Si
  • Ziyu Guan
  • Wei Zhao
  • Yue Wu
  • Xiyue Gao

Multi-view learning aims to combine multiple features to achieve more comprehensive descriptions of data. Most previous works assume that multiple views are strictly aligned. However, real-world multi-view data may contain low-quality conflictive instances, which show conflictive information in different views. Previous methods for this problem mainly focus on eliminating the conflictive data instances by removing them or replacing conflictive views. Nevertheless, real-world applications usually require making decisions for conflictive instances rather than only eliminating them. To solve this, we point out a new Reliable Conflictive Multi-view Learning (RCML) problem, which requires the model to provide decision results and attached reliabilities for conflictive multi-view data. We develop an Evidential Conflictive Multi-view Learning (ECML) method for this problem. ECML first learns view-specific evidence, which could be termed as the amount of support to each category collected from data. Then, we can construct view-specific opinions consisting of decision results and reliability. In the multi-view fusion stage, we propose a conflictive opinion aggregation strategy and theoretically prove this strategy can exactly model the relation of multi-view common and view-specific reliabilities. Experiments performed on 6 datasets verify the effectiveness of ECML. The code is released at https://github.com/jiajunsi/RCML.

IJCAI Conference 2024 Conference Paper

Trusted Multi-view Learning with Label Noise

  • Cai Xu
  • Yilin Zhang
  • Ziyu Guan
  • Wei Zhao

Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However, these methods heavily rely on high-quality ground-truth labels. This motivates us to delve into a new generalized trusted multi-view learning problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose a trusted multi-view noise refining method to solve this problem. We first construct view-opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates. Subsequently, we design view-specific noise correlation matrices that transform the original opinions into noisy opinions aligned with the noisy labels. Considering label noises originating from low-quality data features and easily-confused classes, we ensure that the diagonal elements of these matrices are inversely proportional to the uncertainty, while incorporating class relations into the off-diagonal elements. Finally, we aggregate the noisy opinions and employ a generalized maximum likelihood loss on the aggregated opinion for model training, guided by the noisy labels. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets. Experiment results show that TMNR outperforms baseline methods on accuracy, reliability and robustness. The code and appendix are released at https: //github. com/YilinZhang107/TMNR.

AAAI Conference 2023 Conference Paper

Progressive Deep Multi-View Comprehensive Representation Learning

  • Cai Xu
  • Wei Zhao
  • Jinglong Zhao
  • Ziyu Guan
  • Yaming Yang
  • Long Chen
  • Xiangyu Song

Multi-view Comprehensive Representation Learning (MCRL) aims to synthesize information from multiple views to learn comprehensive representations of data items. Prevalent deep MCRL methods typically concatenate synergistic view-specific representations or average aligned view-specific representations in the fusion stage. However, the performance of synergistic fusion methods inevitably degenerate or even fail when partial views are missing in real-world applications; the aligned based fusion methods usually cannot fully exploit the complementarity of multi-view data. To eliminate all these drawbacks, in this work we present a Progressive Deep Multi-view Fusion (PDMF) method. Considering the multi-view comprehensive representation should contain complete information and the view-specific data contain partial information, we deem that it is unstable to directly learn the mapping from partial information to complete information. Hence, PDMF employs a progressive learning strategy, which contains the pre-training and fine-tuning stages. In the pre-training stage, PDMF decodes the auxiliary comprehensive representation to the view-specific data. It also captures the consistency and complementarity by learning the relations between the dimensions of the auxiliary comprehensive representation and all views. In the fine-tuning stage, PDMF learns the mapping from the original data to the comprehensive representation with the help of the auxiliary comprehensive representation and relations. Experiments conducted on a synthetic toy dataset and 4 real-world datasets show that PDMF outperforms state-of-the-art baseline methods. The code is released at https://github.com/winterant/PDMF.

IJCAI Conference 2022 Conference Paper

Charge Prediction by Constitutive Elements Matching of Crimes

  • Jie Zhao
  • Ziyu Guan
  • Cai Xu
  • Wei Zhao
  • Enze Chen

Charge prediction is to automatically predict the judgemental charges for legal cases. To convict a person/unit of a charge, the case description must contain matching instances of the constitutive elements (CEs) of that charge. This knowledge of CEs is a valuable guide for the judge in making final decisions. However, it is far from fully exploited for charge prediction in the literature. In this paper we propose a novel method named Constitutive Elements-guided Charge Prediction (CECP). CECP mimics human's charge identification process to extract potential instances of CEs and generate predictions accordingly. It avoids laborious labeling of matching instances of CEs by a novel reinforcement learning module which progressively selects potentially matching sentences for CEs and evaluates their relevance. The final prediction is generated based on the selected sentences and their relevant CEs. Experiments on two real-world datasets show the superiority of CECP over competitive baselines.

NeurIPS Conference 2022 Conference Paper

Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering

  • Yaming Yang
  • Ziyu Guan
  • Zhe Wang
  • Wei Zhao
  • Cai Xu
  • Weigang Lu
  • Jianbin Huang

Recent self-supervised pre-training methods on Heterogeneous Information Networks (HINs) have shown promising competitiveness over traditional semi-supervised Heterogeneous Graph Neural Networks (HGNNs). Unfortunately, their performance heavily depends on careful customization of various strategies for generating high-quality positive examples and negative examples, which notably limits their flexibility and generalization ability. In this work, we present SHGP, a novel Self-supervised Heterogeneous Graph Pre-training approach, which does not need to generate any positive examples or negative examples. It consists of two modules that share the same attention-aggregation scheme. In each iteration, the Att-LPA module produces pseudo-labels through structural clustering, which serve as the self-supervision signals to guide the Att-HGNN module to learn object embeddings and attention coefficients. The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings. Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. We release our source code at: https: //github. com/kepsail/SHGP.

IJCAI Conference 2019 Conference Paper

Adversarial Incomplete Multi-view Clustering

  • Cai Xu
  • Ziyu Guan
  • Wei Zhao
  • Hongchang Wu
  • Yunfei Niu
  • Beilei Ling

Multi-view clustering aims to leverage information from multiple views to improve clustering. Most previous works assumed that each view has complete data. However, in real-world datasets, it is often the case that a view may contain some missing data, resulting in the incomplete multi-view clustering problem. Previous methods for this problem have at least one of the following drawbacks: (1) employing shallow models, which cannot well handle the dependence and discrepancy among different views; (2) ignoring the hidden information of the missing data; (3) dedicated to the two-view case. To eliminate all these drawbacks, in this work we present an Adversarial Incomplete Multi-view Clustering (AIMC) method. Unlike most existing methods which only learn a new representation with existing views, AIMC seeks the common latent space of multi-view data and performs missing data inference simultaneously. In particular, the element-wise reconstruction and the generative adversarial network (GAN) are integrated to infer the missing data. They aim to capture overall structure and get a deeper semantic understanding respectively. Moreover, an aligned clustering loss is designed to obtain a better clustering structure. Experiments conducted on three datasets show that AIMC performs well and outperforms baseline methods.

IJCAI Conference 2018 Conference Paper

Deep Multi-View Concept Learning

  • Cai Xu
  • Ziyu Guan
  • Wei Zhao
  • Yunfei Niu
  • Quan Wang
  • Zhiheng Wang

Multi-view data is common in real-world datasets, where different views describe distinct perspectives. To better summarize the consistent and complementary information in multi-view data, researchers have proposed various multi-view representation learning algorithms, typically based on factorization models. However, most previous methods were focused on shallow factorization models which cannot capture the complex hierarchical information. Although a deep multi-view factorization model has been proposed recently, it fails to explicitly discern consistent and complementary information in multi-view data and does not consider conceptual labels. In this work we present a semi-supervised deep multi-view factorization method, named Deep Multi-view Concept Learning (DMCL). DMCL performs nonnegative factorization of the data hierarchically, and tries to capture semantic structures and explicitly model consistent and complementary information in multi-view data at the highest abstraction level. We develop a block coordinate descent algorithm for DMCL. Experiments conducted on image and document datasets show that DMCL performs well and outperforms baseline methods.