Author name cluster

Xiaofeng Cao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

AAAI Conference 2026 Conference Paper

Exploiting Geometric Structures for Modeling Multi-Agent Behaviors: A New Thinking

Bohao Qu
Xiaofeng Cao
Bing Li
Menglin Zhang
Tuan-Anh Vu
Di Lin
Qing Guo

In this paper, we rethink model agent behaviors from a geometric structure perspective in multi-agent reinforcement learning. Modeling agent behaviors is essential for understanding how agents interact and facilitating effective decisions. The key lies in capturing the dependencies and sequential relationships among agent decisions. Since each decision influences the subsequent choices, this forms a hierarchical and nested tree-like structure of interdependencies. While modeling tree-like data in Euclidean spaces could cause distortion, which results in a loss of agent decision structure information. Motivated by this, we reconsider model agent behaviors in hyperbolic space and propose the Hyperbolic Multi-Agent Representations (HMAR) method, which projects the agent behaviors into a Poincaré ball and leverages hyperbolic neural networks to learn agent policy representations. Additionally, we designed a contrastive loss function to train this network, minimizing the distance in feature space between different representations of the same agent while maximizing the distance between representations of distinct agents. Experimental results provide empirical evidence for the effectiveness of the HMAR method in cooperative and competitive environments, demonstrating the potential of hyperbolic agent representations for effective decision-making in multi-agent environments.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning

Disen Hu
Xun Jiang
Xiaofeng Cao
Zheng Wang
Jingkuan Song
Heng Tao Shen
Xing Xu

Robust Multimodal Learning (RML) aims to address the issues of unreliable predictions of multimodal models. Nevertheless, previous RML works often struggle to distinguish between different categories that rely on identical intra-modal cues, making ambiguous predictions. We defined this degree of ``uncertain'' in extracting discriminative features of a multimodal model as vagueness. Neglecting such vagueness, as previous RML works commonly do, will undermine the ability to extract unique semantics of each category in multimodal models, further resulting in worse robustness under disturbances that affect semantic representations. Additionally, this vagueness will lead the parameter updating processes towards unreliable fusion, thus diverting the learning processes of the multimodal model from learning unique features of each category. Based on the above insight, we propose a novel robust multimodal learning approach, termed Hyper-Opinion Quantifying Vagueness (HOQV). Specifically, we first introduce hyper-opinion to capture and quantify the vagueness of multimodal learning in discriminating representations of different categories. Moreover, to mitigate the interference in parameter updating of unreliable representations with high vagueness, we also design the Hyper-Opinion Gradient Modulation to guide the optimization processes. We evaluate our HOQV on six datasets with different disturbances, including noise and adversarial attack, and demonstrate that our proposed method achieves state-of-the-art performance consistently.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TR-DQ: Time-Rotation Diffusion Quantization

Yihua Shao
Deyang Lin
Minxi Yan
Siyu Chen
Fanhu Zeng
Minwen Liao
Ao Ma
Ziyang Yan

Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to account for significant activations that cannot be eliminated, resulting in substantial performance degradation after quantization. To address these issues, we propose Time-Rotation Diffusion Quantization (TR-DQ), a novel quantization method incorporating time-step and rotation-based optimization. TR-DQ first divides the sampling process based on time-steps and applies a rotation matrix to smooth activations and weights dynamically. For different time-steps, a dedicated hyperparameter is introduced for adaptive timing modeling, which enables dynamic quantization across different time steps. Additionally, we also explore the compression potential of Classifier-Free Guidance (CFG-wise) to establish a foundation for subsequent work. TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1.38-1.89× speedup and 1.97-2.58× memory reduction in inference compared to existing quantization methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Concept Matching with Agent for Out-of-Distribution Detection

Yuxiao Lee
Xiaofeng Cao
Jingcai Guo
Wei Ye
Qing Guo
Yi Chang

The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. To expand the usage scenarios of LLM, some works enhance the effectiveness and capabilities of the model by introducing more external information, which is called the agent paradigm. Based on this idea, we propose a new method that integrates the agent paradigm into out-of-distribution (OOD) detection task, aiming to improve its robustness and adaptability. Our proposed method, Concept Matching with Agent (CMA), employs neutral prompts as agents to augment the CLIP-based OOD detection process. These agents function as dynamic observers and communication hubs, interacting with both In-distribution (ID) labels and data inputs to form vector triangle relationships. This triangular framework offers a more nuanced approach than the traditional binary relationship, allowing for better separation and identification of ID and OOD inputs. Our extensive experimental results showcase the superior performance of CMA over both zero-shot and training-required methods in a diverse array of real-world scenarios.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Long-tailed Recognition with Model Rebalancing

JIAAN LUO
Feng Hong
Qiang Hu
Xiaofeng Cao
Feng Liu
Jiangchao Yao

Long-tailed recognition is ubiquitous and challenging in deep learning and even in the downstream finetuning of foundation models, since the skew class distribution generally prevents the model generalization to the tail classes. Despite the promise of previous methods from the perspectives of data augmentation, loss rebalancing and decoupled training etc. , consistent improvement in the broad scenarios like multi-label long-tailed recognition is difficult. In this study, we dive into the essential model capacity impact under long-tailed context, and propose a novel framework, Model Rebalancing (MORE), which mitigates imbalance by directly rebalancing the model's parameter space. Specifically, MORE introduces a low-rank parameter component to mediate the parameter space allocation guided by a tailored loss and sinusoidal reweighting schedule, but without increasing the overall model complexity or inference costs. Extensive experiments on diverse long-tailed benchmarks, spanning multi-class and multi-label tasks, demonstrate that MORE significantly improves generalization, particularly for tail classes, and effectively complements existing imbalance mitigation methods. These results highlight MORE's potential as a robust plug-and-play module in long-tailed settings.

PDF Details

NeurIPS Conference 2025 Conference Paper

Preference-driven Knowledge Distillation for Few-shot Node Classification

Xing Wei
Chunchun Chen
Rui Fan
Xiaofeng Cao
Sourav Medya
Wei Ye

Graph neural networks (GNNs) can efficiently process text-attributed graphs (TAGs) due to their message-passing mechanisms, but their training heavily relies on the human-annotated labels. Moreover, the complex and diverse local topologies of nodes of real-world TAGs make it challenging for a single mechanism to handle. Large language models (LLMs) perform well in zero-/few-shot learning on TAGs but suffer from a scalability challenge. Therefore, we propose a preference-driven knowledge distillation (PKD) framework to synergize the complementary strengths of LLMs and various GNNs for few-shot node classification. Specifically, we develop a GNN-preference-driven node selector that effectively promotes prediction distillation from LLMs to teacher GNNs. To further tackle nodes' intricate local topologies, we develop a node-preference-driven GNN selector that identifies the most suitable teacher GNN for each node, thereby facilitating tailored knowledge distillation from teacher GNNs to the student GNN. Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs. Our code can be available at.

PDF Details

TMLR Journal 2025 Journal Article

Seeing Beyond Labels: Source-Free Domain Adaptation via Hypothesis Consolidation of Prediction Rationale

Yangyang Shu
Yuhang Liu
Xiaofeng Cao
Qi Chen
Bowen Zhang
Ziqin Zhou
Anton van den Hengel
Lingqiao Liu

Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model's predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis. By consolidating these hypothesis rationales, we identify the most likely correct hypotheses, which we then use as a pseudo-labeled set to support a semi-supervised learning procedure for model adaptation. This approach distinguishes itself from conventional semi-supervised learning by relying solely on pseudo-labels rather than ground-truth annotations. To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance in the SFUDA task and can be easily integrated into existing approaches to improve their performance. The codes are available at \url{https://github.com/GANPerf/HCPR}.

PDF Details

IJCAI Conference 2025 Conference Paper

TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

Miaoge Li
Jingcai Guo
Richard Yi Da Xu
Dongsheng Wang
Xiaofeng Cao
Zhijie Rao
Song Guo

Compositional Zero-Shot Learning (CZSL) aims to recognize novel state-object compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In this paper, our interest is to revisit the conditional transport (CT) theory and its homology to the visual-semantics interaction in CZSL and further, propose a novel Trisets Consistency Alignment framework (dubbed TsCA) that well-addresses these issues. Concretely, we utilize three distinct yet semantically homologous sets, i. e. , patches, primitives, and compositions, to construct pairwise CT costs to minimize their semantic discrepancies. To further ensure the consistency transfer within these sets, we implement a cycle-consistency constraint that refines the learning by guaranteeing the feature consistency of the self-mapping during transport flow, regardless of modality. Moreover, we extend the CT plans to an open-world setting, which enables the model to effectively filter out unfeasible pairs, thereby speeding up the inference as well as increasing the accuracy. Extensive experiments are conducted to verify the effectiveness of the proposed method. The code is available at https: //github. com/keepgoingjkg/TsCA.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Deep Hierarchical Graph Alignment Kernels

Shuhao Tang
Hao Tian
Xiaofeng Cao
Wei Ye

Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https: //github. com/EWesternRa/DHGAK).

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Dual Expert Distillation Network for Generalized Zero-Shot Learning

Zhijie Rao
Jingcai Guo
Xiaocheng Lu
Jingming Liang
Jie Zhang
Haozhao Wang
Kang Wei
Xiaofeng Cao

Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i. e. , Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Extensive experiments on various benchmark datasets indicate a new state-of-the-art. The code is available at github. com/zjrao/DEDN.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Geometry Awakening: Cross-Geometry Learning Exhibits Superiority over Individual Structures

Yadong Sun
Xiaofeng Cao
Yu Wang
Wei Ye
Jingcai Guo
Qing Guo

Recent research has underscored the efficacy of Graph Neural Networks (GNNs) in modeling diverse geometric structures within graph data. However, real-world graphs typically exhibit geometrically heterogeneous characteristics, rendering the confinement to a single geometric paradigm insufficient for capturing their intricate structural complexities. To address this limitation, we examine the performance of GNNs across various geometries through the lens of knowledge distillation (KD) and introduce a novel cross-geometric framework. This framework encodes graphs by integrating both Euclidean and hyperbolic geometries in a space-mixing fashion. Our approach employs multiple teacher models, each generating hint embeddings that encapsulate distinct geometric properties. We then implement a structure-wise knowledge transfer module that optimally leverages these embeddings within their respective geometric contexts, thereby enhancing the training efficacy of the student model. Additionally, our framework incorporates a geometric optimization network designed to bridge the distributional disparities among these embeddings. Experimental results demonstrate that our model-agnostic framework more effectively captures topological graph knowledge, resulting in superior performance of the student models when compared to traditional KD methodologies.

PDF Details DOI

JMLR Journal 2024 Journal Article

Mentored Learning: Improving Generalization and Convergence of Student Learner

Xiaofeng Cao
Yaming Guo
Heng Tao Shen
Ivor W. Tsang
James T. Kwok

Student learners typically engage in an iterative process of actively updating its hypotheses, like active learning. While this behavior can be advantageous, there is an inherent risk of introducing mistakes through incremental updates including weak initialization, inaccurate or insignificant history states, resulting in expensive convergence cost. In this work, rather than solely monitoring the update of the learner's status, we propose monitoring the disagreement w.r.t. $\mathcal{F}^\mathcal{T}(\cdot)$ between the learner and teacher, and call this new paradigm “Mentored Learning”, which consists of `how to teach' and `how to learn'. By actively incorporating feedback that deviates from the learner's current hypotheses, convergence will be much easier to analyze without strict assumptions on learner's historical status, then deriving tighter generalization bounds on error and label complexity. Formally, we introduce an approximately optimal teaching hypothesis, $h^\mathcal{T}$, incorporating a tighter slack term $\left(1+\mathcal{F}^{\mathcal{T}}(\widehat{h}_t)\right)\Delta_t$ to replace the typical $2\Delta_t$ used in hypothesis pruning. Theoretically, we demonstrate that, guided by this teaching hypothesis, the learner can converge to tighter generalization bounds on error and label complexity compared to non-educated learners who lack guidance from a teacher: 1) the generalization error upper bound can be reduced from $R(h^*)+4\Delta_{T-1}$ to approximately $R(h^{\mathcal{T}})+2\Delta_{T-1}$, and 2) the label complexity upper bound can be decreased from $4 \theta\left(TR(h^{*})+2O(\sqrt{T})\right)$ to approximately $2\theta\left(2TR(h^{\mathcal{T}})+3 O(\sqrt{T})\right)$. To adhere strictly to our assumption, self-improvement of teaching is proposed when $h^\mathcal{T}$ loosely approximates $h^*$. In the context of learning, we further consider two teaching scenarios: instructing a white-box and black-box learner. Experiments validate this teaching concept and demonstrate superior generalization performance compared to fundamental active learning strategies, such as IWAL, IWAL-D, etc. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

PDF Details

ICML Conference 2024 Conference Paper

Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

Zhenlong Liu
Lei Feng 0006
Huiping Zhuang
Xiaofeng Cao
Hongxin Wei

Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set. Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk. However, optimizing toward a reverse direction may cause the model parameters to oscillate near local minima, leading to instability and suboptimal performance. In this work, we propose a novel method – Convex Concave Loss (CCL), which enables a high variance of training loss distribution by gradient descent. Our method is motivated by the theoretical analysis that convex losses tend to decrease the loss variance during training. Thus, our key idea behind CCL is to reduce the convexity of loss functions with a concave term. Trained with CCL, neural networks produce losses with high variance for training data, reinforcing the defense against MIAs. Extensive experiments demonstrate the superiority of CCL, achieving a state-of-the-art balance in the privacy-utility trade-off.

Details

NeurIPS Conference 2024 Conference Paper

Sharpness-Aware Minimization Activates the Interactive Teaching's Understanding and Optimization

Mingwei Xu
Xiaofeng Cao
Ivor Tsang

Teaching is a potentially effective approach for understanding interactions among multiple intelligences. Previous explorations have convincingly shown that teaching presents additional opportunities for observation and demonstration within the learning model, such as data distillation and selection. However, the underlying optimization principles and convergence of interactive teaching lack theoretical analysis, and in this regard co-teaching serves as a notable prototype. In this paper, we discuss its role as a reduction of the larger loss landscape derived from Sharpness-Aware Minimization (SAM). Then, we classify it as an iterative parameter estimation process using Expectation-Maximization. The convergence of this typical interactive teaching is achieved by continuously optimizing a variational lower bound on the log marginal likelihood. This lower bound represents the expected value of the log posterior distribution of the latent variables under a scaled, factorized variational distribution. To further enhance interactive teaching's performance, we incorporate SAM's strong generalization information into interactive teaching, referred as Sharpness Reduction Interactive Teaching (SRIT). This integration can be viewed as a novel sequential optimization process. Finally, we validate the performance of our approach through multiple experiments.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Nonparametric Teaching for Multiple Learners

Chen Zhang
Xiaofeng Cao
Weiyang Liu
Ivor Tsang
James Kwok

We study the problem of teaching multiple learners simultaneously in the nonparametric iterative teaching setting, where the teacher iteratively provides examples to the learner for accelerating the acquisition of a target concept. This problem is motivated by the gap between current single-learner teaching setting and the real-world scenario of human instruction where a teacher typically imparts knowledge to multiple students. Under the new problem formulation, we introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model. To achieve this, we frame the problem as teaching a vector-valued target model and extend the target model space from a scalar-valued reproducing kernel Hilbert space used in single-learner scenarios to a vector-valued space. Furthermore, we demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching, particularly when the multiple learners can communicate with each other. Lastly, we conduct extensive experiments to validate the practicality and efficiency of MINT.

PDF Details

IJCAI Conference 2019 Conference Paper

Learning Image-Specific Attributes by Hyperbolic Neighborhood Graph Propagation

Xiaofeng Xu
Ivor W. Tsang
Xiaofeng Cao
Ruiheng Zhang
Chuancai Liu

As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.

PDF Details