Author name cluster

Yibing Zhan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

Cross-Sample Augmented Test-Time Adaptation for Personalized Intraoperative Hypotension Prediction

Kanxue Li
Yibing Zhan
Hua Jin
Chongchong Qi
Xu Lin
Baosheng Yu

Intraoperative hypotension (IOH) poses significant surgical risks, but accurate prediction remains challenging due to patient-specific variability. While test-time adaptation (TTA) offers a promising approach for personalized prediction, the rarity of IOH events often leads to unreliable test-time training. To address this, we propose CSA-TTA, a novel cross-sample augmented test-time adaptation framework that enhances training by incorporating hypotension events from other individuals. Specifically, we first construct a cross-sample bank by segmenting historical data into hypotensive and non-hypotensive samples. Then, we introduce a coarse-to-fine retrieval strategy for building test-time training data: we initially apply K-Shape clustering to identify representative cluster centers and subsequently retrieve the top-K semantically similar samples based on the current patient signal. Additionally, we integrate both self-supervised masked reconstruction and retrospective sequence forecasting signals during training to enhance model adaptability to rapid and subtle intraoperative dynamics. We evaluate the proposed CSA-TTA on both the VitalDB dataset and a real-world in-hospital dataset by integrating it with state-of-the-art time series forecasting models, including TimesFM and UniTS. CSA-TTA consistently enhances performance across settings—for instance, on VitalDB, it improves Recall and F1 scores by +1.33% and +1.13%, respectively, under fine-tuning, and by +7.46% and +5.07% in zero-shot scenarios—demonstrating strong robustness and generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs

Weigang Lu
Ziyu Guan
Wei Zhao
Yaming Yang
Yujie Sun
Zheng Liang
Yibing Zhan
Dapeng Tao

GNN-to-MLP (G2M) methods have emerged as a promising approach to accelerate Graph Neural Networks (GNNs) by distilling their knowledge into simpler Multi-Layer Perceptrons (MLPs). These methods bridge the gap between the expressive power of GNNs and the computational efficiency of MLPs, making them well-suited for resource-constrained environments. However, existing G2M methods are limited by their inability to flexibly adjust inference cost and accuracy dynamically, a critical requirement for real-world applications where computational resources and time constraints can vary significantly. To address this, we introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge distillation (ProGMLP). ProGMLP employs a Progressive Training Structure (PTS), where multiple MLP students are trained in sequence, each building on the previous one. Furthermore, ProGMLP incorporates Progressive Knowledge Distillation (PKD) to iteratively refine the distillation process from GNNs to MLPs, and Progressive Mixup Augmentation (PMA) to enhance generalization by progressively generating harder mixed samples. Our approach is validated through comprehensive experiments on eight real-world graph datasets, demonstrating that ProGMLP maintains high accuracy while dynamically adapting to varying runtime scenarios, making it highly effective for deployment in diverse application settings.

PDF Details DOI

AAAI Conference 2025 Conference Paper

AGMixup: Adaptive Graph Mixup for Semi-supervised Node Classification

Weigang Lu
Ziyu Guan
Wei Zhao
Yaming Yang
Yibing Zhan
Yiheng Lu
Dapeng Tao

Mixup is a data augmentation technique that enhances model generalization by interpolating between data points using a mixing ratio lambda in the image domain. Recently, the concept of mixup has been adapted to the graph domain through node-centric interpolations. However, these approaches often fail to address the complexity of interconnected relationships, potentially damaging the graph's natural topology and undermining node interactions. Furthermore, current graph mixup methods employ a one-size-fits-all strategy with a randomly sampled lambda for all mixup pairs, ignoring the diverse needs of different pairs. This paper proposes an Adaptive Graph Mixup (AGMixup) framework for semi-supervised node classification. AGMixup introduces a subgraph-centric approach, which treats each subgraph similarly to how images are handled in Euclidean domains, thus facilitating a more natural integration of mixup into graph-based learning. We also propose an adaptive mechanism to tune the mixing ratio lambda for diverse mixup pairs, guided by the contextual similarity and uncertainty of the involved subgraphs. Extensive experiments across seven datasets on semi-supervised node classification benchmarks demonstrate AGMixup's superiority over state-of-the-art graph mixup methods.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution

Wentao Tan
Qiong Cao
Yibing Zhan
Chao Xue
Changxing Ding

Human preference alignment can significantly enhance the capabilities of Multimodal Large Language Models (MLLMs). However, collecting high-quality preference data remains costly. One promising solution is the self-evolution strategy, where models are iteratively trained on data they generate. Current multimodal self-evolution techniques, nevertheless, still need human- or GPT-annotated data. Some methods even require extra models or ground truth answers to construct preference data. To overcome these limitations, we propose a novel multimodal self-evolution framework that empowers the model to autonomously generate high-quality questions and answers using only unannotated images. First, in the question generation phase, we implement an image-driven self-questioning mechanism. This approach allows the model to create questions and evaluate their relevance and answerability based on the image content. If a question is deemed irrelevant or unanswerable, the model regenerates it to ensure alignment with the image. This process establishes a solid foundation for subsequent answer generation and optimization. Second, while generating answers, we design an answer self-enhancement technique to boost the discriminative power of answers. We begin by captioning the images and then use the descriptions to enhance the generated answers. Additionally, we utilize corrupted images to generate rejected answers, thereby forming distinct preference pairs for effective optimization. Finally, in the optimization step, we incorporate an image content alignment loss function alongside the Direct Preference Optimization (DPO) loss to mitigate hallucinations. This function maximizes the likelihood of the above generated descriptions in order to constrain the model's attention to the image content. As a result, model can generate more accurate and reliable outputs. Experiments demonstrate that our framework is competitively compared with previous methods that utilize external information, paving the way for more efficient and scalable MLLMs.

PDF Details DOI

JBHI Journal 2025 Journal Article

CPGNet: Multimodal Graph Learning with Hierarchical Category Guidance for Multi-Label Whole Slide Image Classification

Haoyun Zhao
Dapeng Tao
Yibing Zhan
Jun Ni
Yang Chen

The analysis of WSI categories in digital pathology is critical for clinician decision making regarding the diagnosis, treatment, and prognosis of cancer patients. However, current automated methods for cancer type identification are predominantly formulated as single-label classification problems. These methods typically rely on datasets with relatively balanced and abundant samples, where each WSI belongs to a single category. This approach does not fully align with real-world clinical scenarios, where cancer subtypes often exhibit multi-label characteristics and class imbalance, posing significant challenges. To address this issue, this paper proposes CPGNet, a category-prompted graph network designed as a multi-label WSI classifier better suited for clinical applications. CPGNet employs the MaskSLIC algorithm for superpixel segmentation of WSIs, effectively capturing the nonlinear spatial distribution of cellular and tissue structures. The segmented superpixels are then encoded as graph nodes with their corresponding features, while edges and edge features are constructed to abstractly model WSIs as graphs. Furthermore, the method introduces a GLGFI module, which aggregates features from neighboring nodes and edges via a GNN to capture local information, while simultaneously leveraging a multi-head self-attention mechanism to model global dependencies, mimicking the diagnostic behavior of pathologists. Additionally, a VCI module exploits semantic relationships between categories to guide visual feature classification, providing supplementary cues for accurate predictions. To enhance the model's focus on hard-to-classify positive samples, we also implement a reweighting strategy. The proposed approach is evaluated on a private dataset (YNLUAD) and two public challenge datasets (BCNB and AGGC22). The experimental results demonstrate the superiority, universality, and robustness of CPGNet. The code is available at https://github.com/zhy1312/CPGNet.

Details DOI

AAAI Conference 2025 Conference Paper

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

Tianle Xia
Liang Ding
Guojia Wan
Yibing Zhan
Bo Du
Dacheng Tao

Answering complex queries over incomplete knowledge graphs (KGs) is a challenging job. Most previous works have focused on learning entity/relation embeddings and simulating first-order logic operators with various neural networks. However, they are bottlenecked by the inability to share world knowledge to improve logical reasoning, thus resulting in suboptimal performance. In this paper, we propose a complex reasoning schema over KG upon large language models (LLMs), containing a curriculum-based logical-aware instruction tuning framework, named LACT. Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition, to stimulate the reasoning capability of LLMs. To address the difficulty gap among different types of complex queries, we design a simple and flexible logic-aware curriculum learning framework. Experiments across widely used datasets demonstrate that LACT has substantial improvements~(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Modeling All Response Surfaces in One for Conditional Search Spaces

Jiaxing Li
Wei Liu
Chao Xue
Yibing Zhan
Xiaoxing Wang
Weifeng Liu
Dacheng Tao

Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.

PDF Details DOI

ICLR Conference 2025 Conference Paper

NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

Zheng Yi Ho
Siyuan Liang 0004
Sen Zhang 0006
Yibing Zhan
Dacheng Tao

Hallucinations in Large Language Models (LLMs) remain a major obstacle, particularly in high-stakes applications where factual accuracy is critical. While representation editing and reading methods have made strides in reducing hallucinations, their heavy reliance on specialised tools and training on in-domain samples, makes them difficult to scale and prone to overfitting. This limits their accuracy gains and generalizability to diverse datasets. This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs). NoVo begins by automatically selecting truth-correlated head norms with an efficient, inference-only algorithm using only 30 random samples, allowing NoVo to effortlessly scale to diverse datasets. Afterwards, selected head norms are employed in a simple voting algorithm, which yields significant gains in prediction accuracy. On TruthfulQA MC1, NoVo surpasses the current state-of-the-art and all previous methods by an astounding margin---at least 19 accuracy points. NoVo demonstrates exceptional generalization to 20 diverse datasets, with significant gains in over 90\% of them, far exceeding all current representation editing and reading methods. NoVo also reveals promising gains to finetuning strategies and building textual adversarial defence. NoVo's effectiveness with head norms opens new frontiers in LLM interpretability, robustness and reliability. Our code is available at: https://github.com/hozhengyi/novo

Details

AIJ Journal 2025 Journal Article

NT-FAN: A simple yet effective noise-tolerant few-shot adaptation network

Wenjing Yang
Haoang Chi
Yibing Zhan
Bowen Hu
Xiaoguang Ren
Dapeng Tao
Long Lan

Few-shot domain adaptation (FDA) aims to train a target model with clean labeled data from the source domain and few labeled data from the target domain. Given a limited annotation budget, source data may contain many noisy labels, which can detrimentally impact the performance of models in real-world applications. This problem setting is denoted as wildly few-shot domain adaptation (WFDA), simultaneously taking care of label noise and data shortage. While previous studies have achieved some success, they typically rely on multiple adaptation models to collaboratively filter noisy labels, resulting in substantial computational overhead. To address WFDA more simply and elegantly, we offer a theoretical analysis of this problem and propose a comprehensive upper bound for the excess risk on the target domain. Our theoretical result reveals that correct domain-invariant representations can be obtained even in the presence of source noise and limited target data without incurring additional costs. In response, we propose a simple yet effective WFDA method, referred to as noise-tolerant few-shot adaptation network (NT-FAN). Experiments demonstrate that our method significantly outperforms all the state-of-the-art competitors while maintaining a more lightweight architecture. Notably, NT-FAN consistently exhibits robust performance when dealing with more realistic and intractable source noise (e. g. , instance-dependent label noise) and severe source noise (e. g. , a 40% noise rate) in the source domain.

Details DOI

ICML Conference 2024 Conference Paper

A Dual-module Framework for Counterfactual Estimation over Time

Xin Wang 0179
Shengfei Lyu
Lishan Yang 0004
Yibing Zhan
Huanhuan Chen 0001

Efficiently and effectively estimating counterfactuals over time is crucial for optimizing treatment strategies. We present the Adversarial Counterfactual Temporal Inference Network (ACTIN), a novel framework with dual modules to enhance counterfactual estimation. The balancing module employs a distribution-based adversarial method to learn balanced representations, extending beyond the limitations of current classification-based methods to mitigate confounding bias across various treatment types. The integrating module adopts a novel Temporal Integration Predicting (TIP) strategy, which has a wider receptive field of treatments and balanced representations from the beginning to the current time for a more profound level of analysis. TIP goes beyond the established Direct Predicting (DP) strategy, which only relies on current treatments and representations, by empowering the integrating module to effectively capture long-range dependencies and temporal treatment interactions. ACTIN exceeds the confines of specific base models, and when implemented with simple base models, consistently delivers state-of-the-art performance and efficiency across both synthetic and real-world datasets.

Details

ICML Conference 2024 Conference Paper

Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

Ziyi Zhang 0001
Sen Zhang 0006
Yibing Zhan
Yong Luo 0002
Yonggang Wen 0001
Dacheng Tao

Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative workflows. While optimizing downstream reward models has emerged as a promising alignment strategy, concerns arise regarding the risk of excessive optimization with learned reward models, which potentially compromises ground-truth performance. In this work, we confront the reward overoptimization problem in diffusion model alignment through the lenses of both inductive and primacy biases. We first identify a mismatch between current methods and the temporal inductive bias inherent in the multi-step denoising process of diffusion models, as a potential source of reward overoptimization. Then, we surprisingly discover that dormant neurons in our critic model act as a regularization against reward overoptimization while active neurons reflect primacy bias. Motivated by these observations, we propose Temporal Diffusion Policy Optimization with critic active neuron Reset (TDPO-R), a policy gradient algorithm that exploits the temporal inductive bias of diffusion models and mitigates the primacy bias stemming from active neurons. Empirical results demonstrate the superior efficacy of our methods in mitigating reward overoptimization. Code is avaliable at https: //github. com/ZiyiZhang27/tdpo.

Details

IJCAI Conference 2024 Conference Paper

Gradformer: Graph Transformer with Exponential Decay

Chuang Liu
Zelin Yao
Yibing Zhan
Xueqi Ma
Shirui Pan
Wenbin Hu

Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models. Codes are available at https: //github. com/LiuChuang0059/Gradformer.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Joint Input and Output Coordination for Class-Incremental Learning

Shuai Wang
Yibing Zhan
Yong Luo
Han Hu
Wei Yu
Yonggang Wen
Dacheng Tao

Incremental learning is nontrivial due to severe catastrophic forgetting. Although storing a small amount of data on old tasks during incremental learning is a feasible solution, current strategies still do not 1) adequately address the class bias problem, and 2) alleviate the mutual interference between new and old tasks, and 3) consider the problem of class bias within tasks. In light of the above issues, we analyze the cause of class bias in incremental learning, as well as the drawbacks of existing approaches, and propose a joint input and output coordination (JIOC) mechanism to address these issues. This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks. The proposed mechanism is general and flexible, and can be incorporated into different incremental learning approaches that use memory storage. Extensive experiments show that our mechanism can significantly improve their performance.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models

Kanxue Li
Baosheng Yu
Qi Zheng
Yibing Zhan
Yuhui Zhang
Tianle Zhang
Yijun Yang
Yue Chen

Foundation models have demonstrated significant emergent abilities, holding great promise for enhancing embodied agents' reasoning and planning capacities. However, the absence of a comprehensive benchmark for evaluating embodied agents with multimodal observations in complex environments remains a notable gap. In this paper, we present MuEP, a comprehensive Multimodal benchmark for Embodied Planning. MuEP facilitates the evaluation of multimodal and multi-turn interactions of embodied agents in complex scenes, incorporating fine-grained evaluation metrics that provide insights into the performance of embodied agents throughout each task. Furthermore, we evaluate embodied agents with recent state-of-the-art foundation models, including large language models (LLMs) and large multimodal models (LMMs), on the proposed benchmark. Experimental results show that foundation models based on textual representations of environments usually outperform their visual counterparts, suggesting a gap in embodied planning abilities with multimodal observations. We also find that control language generation is an indispensable ability beyond common-sense knowledge for accurate embodied task completion. We hope the proposed MuEP benchmark can contribute to the advancement of embodied AI with foundation models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Zhiyao Ren
Yibing Zhan
Liang Ding
Gaoang Wang
Chaoyue Wang
Zhongyi Fan
Dacheng Tao

Denoising Diffusion Probabilistic Models (DDPMs) have achieved significant success in generation tasks. Nevertheless, the exposure bias issue, i.e., the natural discrepancy between the training (the output of each step is calculated individually by a given input) and inference (the output of each step is calculated based on the input iteratively obtained based on the model), harms the performance of DDPMs. To our knowledge, few works have tried to tackle this issue by modifying the training process for DDPMs, but they still perform unsatisfactorily due to 1) partially modeling the discrepancy and 2) ignoring the prediction error accumulation. To address the above issues, in this paper, we propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs. Analyzing the formulations of the training and inference of DDPMs, MDSS 1) comprehensively considers the discrepancy influence of prediction errors on the output of the model (the Gaussian noise) and the output of the step (the calculated input signal of the next step), and 2) efficiently models the prediction error accumulation by using multiple iterations of a mathematical formulation initialized from one-step prediction error obtained from the model. The experimental results, compared with previous works, demonstrate that our approach is more effective in mitigating exposure bias in DDPM, DDIM, and DPM-solver. In particular, MDSS achieves an FID score of 3.86 in 100 sample steps of DDIM on the CIFAR-10 dataset, whereas the second best obtains 4.78. The code will be available on GitHub.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Parameter-Efficient Multi-Task Model Fusion with Partial Linearization

Anke Tang
Li Shen 0008
Yong Luo 0002
Yibing Zhan
Han Hu 0003
Bo Du 0001
Yixin Chen 0001
Dacheng Tao

Large pre-trained models have enabled significant advances in machine learning and served as foundation components. Model fusion methods, such as task arithmetic, have been proven to be powerful and scalable to incorporate fine-tuned weights from different tasks into a multi-task model. However, efficiently fine-tuning large pre-trained models on multiple downstream tasks remains challenging, leading to inefficient multi-task model fusion. In this work, we propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques like LoRA fine-tuning. Specifically, our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. This allows us to leverage the the advantages of model fusion over linearized fine-tuning, while still performing fine-tuning and inference efficiently. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model, outperforming standard adapter tuning and task arithmetic alone. Experimental results demonstrate the capabilities of our proposed partial linearization technique to effectively construct unified multi-task models via the fusion of fine-tuned task vectors. We evaluate performance over an increasing number of tasks and find that our approach outperforms standard parameter-efficient fine-tuning techniques. The results highlight the benefits of partial linearization for scalable and efficient multi-task model fusion.

Details

AAAI Conference 2024 Conference Paper

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation

Xin Lin
Chong Shi
Yibing Zhan
Zuopeng Yang
Yaqi Wu
Dacheng Tao

Dynamic scene graph generation (SGG) focuses on detecting objects in a video and determining their pairwise relationships. Existing dynamic SGG methods usually suffer from several issues, including 1) Contextual noise, as some frames might contain occluded and blurred objects. 2) Label bias, primarily due to the high imbalance between a few positive relationship samples and numerous negative ones. Additionally, the distribution of relationships exhibits a long-tailed pattern. To address the above problems, in this paper, we introduce a network named TD2-Net that aims at denoising and debiasing for dynamic SGG. Specifically, we first propose a denoising spatio-temporal transformer module that enhances object representation with robust contextual information. This is achieved by designing a differentiable Top-K object selector that utilizes the gumbel-softmax sampling strategy to select the relevant neighborhood for each object. Second, we introduce an asymmetrical reweighting loss to relieve the issue of label bias. This loss function integrates asymmetry focusing factors and the volume of samples to adjust the weights assigned to individual samples. Systematic experimental results demonstrate the superiority of our proposed TD2-Net over existing state-of-the-art approaches on Action Genome databases. In more detail, TD2-Net outperforms the second-best competitors by 12.7% on mean-Recall@10 for predicate classification.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

Chuang Liu
Yuyao Wang
Yibing Zhan
Xueqi Ma
Dapeng Tao
Jia Wu
Wenbin Hu

Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the potential of leveraging the graph's structural composition as a fundamental and unique prior in the masked pre-training process. To this end, we introduce a novel structure-guided masking strategy (i. e. , StructMAE), designed to refine the existing GMAE models. StructMAE involves two steps: 1) Structure-based Scoring: Each node is evaluated and assigned a score reflecting its structural significance. Two distinct types of scoring manners are proposed: predefined and learnable scoring. 2) Structure-guided Masking: With the obtained assessment scores, we develop an easy-to-hard masking strategy that gradually increases the structural awareness of the self-supervised reconstruction task. Specifically, the strategy begins with random masking and progresses to masking structure-informative nodes based on the assessment scores. This design gradually and effectively guides the model in learning graph structural information. Furthermore, extensive experiments consistently demonstrate that our StructMAE method outperforms existing state-of-the-art GMAE models in both unsupervised and transfer learning tasks. Codes are available at https: //github. com/LiuChuang0059/StructMAE.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Gapformer: Graph Transformer with Graph Pooling for Node Classification

Chuang Liu
Yibing Zhan
Xueqi Ma
Liang Ding
Dapeng Tao
Jia Wu
Wenbin Hu

Graph Transformers (GTs) have proved their advantage in graph-level tasks. However, existing GTs still perform unsatisfactorily on the node classification task due to 1) the overwhelming unrelated information obtained from a vast number of irrelevant distant nodes and 2) the quadratic complexity regarding the number of nodes via the fully connected attention mechanism. In this paper, we present Gapformer, a method for node classification that deeply incorporates Graph Transformer with Graph Pooling. More specifically, Gapformer coarsens the large-scale nodes of a graph into a smaller number of pooling nodes via local or global graph pooling methods, and then computes the attention solely with the pooling nodes rather than all other nodes. In such a manner, the negative influence of the overwhelming unrelated nodes is mitigated while maintaining the long-range information, and the quadratic complexity is reduced to linear complexity with respect to the fixed number of pooling nodes. Extensive experiments on 13 node classification datasets, including homophilic and heterophilic graph datasets, demonstrate the competitive performance of Gapformer over existing Graph Neural Networks and GTs.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities

Chuang Liu
Yibing Zhan
Jia Wu
Chang Li
Bo Du
Wenbin Hu
Tongliang Liu
Dacheng Tao

Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation. As an essential component of the architecture, graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these works. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods for graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods with a mathematical summary for each category; 2) then, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) next, we further outline the applications that incorporate the idea of graph pooling in a variety of domains; 4) finally, we discuss certain critical challenges facing current studies and share our insights on future potential directions for research on the improvement of graph pooling.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning

Shikun Li
Xiaobo Xia
Hansong Zhang
Yibing Zhan
Shiming Ge
Tongliang Liu

In label-noise learning, the noise transition matrix, bridging the class posterior for noisy and clean data, has been widely exploited to learn statistically consistent classifiers. The effectiveness of these algorithms relies heavily on estimating the transition matrix. Recently, the problem of label-noise learning in multi-label classification has received increasing attention, and these consistent algorithms can be applied in multi-label cases. However, the estimation of transition matrices in noisy multi-label learning has not been studied and remains challenging, since most of the existing estimators in noisy multi-class learning depend on the existence of anchor points and the accurate fitting of noisy class posterior. To address this problem, in this paper, we first study the identifiability problem of the class-dependent transition matrix in noisy multi-label learning, and then inspired by the identifiability results, we propose a new estimator by exploiting label correlations without neither anchor points nor accurate fitting of noisy class posterior. Specifically, we estimate the occurrence probability of two noisy labels to get noisy label correlations. Then, we perform sample selection to further extract information that implies clean label correlations, which is used to estimate the occurrence probability of one noisy label when a certain clean label appears. By utilizing the mismatch of label correlations implied in these occurrence probabilities, the transition matrix is identifiable, and can then be acquired by solving a simple bilinear decomposition problem. Empirical results demonstrate the effectiveness of our estimator to estimate the transition matrix with label correlations, leading to better classification performance. Source codes are available at https: //github. com/tmllab/Multi-Label-T.

PDF Details

ICML Conference 2022 Conference Paper

Improving Adversarial Robustness via Mutual Information Estimation

Dawei Zhou 0004
Nannan Wang 0001
Xinbo Gao 0001
Bo Han 0003
Xiaoyu Wang 0002
Yibing Zhan
Tongliang Liu

Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w. r. t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.

Details

NeurIPS Conference 2022 Conference Paper

Pluralistic Image Completion with Gaussian Mixture Models

Xiaobo Xia
Wenhao Yang
Jie Ren
Yewen Li
Yibing Zhan
Bo Han
Tongliang Liu

Pluralistic image completion focuses on generating both visually realistic and diverse results for image completion. Prior methods enjoy the empirical successes of this task. However, their used constraints for pluralistic image completion are argued to be not well interpretable and unsatisfactory from two aspects. First, the constraints for visual reality can be weakly correlated to the objective of image completion or even redundant. Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well. In this paper, to address the issues, we propose an end-to-end probabilistic method. Specifically, we introduce a unified probabilistic graph model that represents the complex interactions in image completion. The entire procedure of image completion is then mathematically divided into several sub-procedures, which helps efficient enforcement of constraints. The sub-procedure directly related to pluralistic results is identified, where the interaction is established by a Gaussian mixture model (GMM). The inherent parameters of GMM are task-related, which are optimized adaptively during training, while the number of its primitives can control the diversity of results conveniently. We formally establish the effectiveness of our method and demonstrate it with comprehensive experiments. The implementationis available at https: //github. com/tmllab/PICMM.

PDF Details

AAAI Conference 2022 Conference Paper

Resistance Training Using Prior Bias: Toward Unbiased Scene Graph Generation

Chao Chen
Yibing Zhan
Baosheng Yu
Liu Liu
Yong Luo
Bo Du

Scene Graph Generation (SGG) aims to build a structured representation of a scene using objects and pairwise relationships, which benefits downstream tasks. However, current SGG methods usually suffer from sub-optimal scene graph generation because of the long-tailed distribution of training data. To address this problem, we propose Resistance Training using Prior Bias (RTPB) for the scene graph generation. Specifically, RTPB uses a distributed-based prior bias to improve models’ detecting ability on less frequent relationships during training, thus improving the model generalizability on tail categories. In addition, to further explore the contextual information of objects and relationships, we design a contextual encoding backbone network, termed as Dual Transformer (DTrans). We perform extensive experiments on a very popular benchmark, VG150, to demonstrate the effectiveness of our method for the unbiased scene graph generation. In specific, our RTPB achieves an improvement of over 10% under the mean recall when applied to current SGG methods. Furthermore, DTrans with RTPB outperforms nearly all stateof-the-art methods with a large margin. Code is available at https: //github. com/ChCh1999/RTPB

PDF Details

NeurIPS Conference 2021 Conference Paper

Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels

Sheng Wan
Yibing Zhan
Liu Liu
Baosheng Yu
Shirui Pan
Chen Gong

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of semi-supervised node classification. However, most existing GNN models require sufficient labeled data for effective network training. Their performance can be seriously degraded when labels are extremely limited. To address this issue, we propose a new framework termed Contrastive Graph Poisson Networks (CGPN) for node classification under extremely limited labeled data. Specifically, our CGPN derives from variational inference; integrates a newly designed Graph Poisson Network (GPN) to effectively propagate the limited labels to the entire graph and a normal GNN, such as Graph Attention Network, that flexibly guides the propagation of GPN; applies a contrastive objective to further exploit the supervision information from the learning process of GPN and GNN models. Essentially, our CGPN can enhance the learning performance of GNNs under extremely limited labels by contrastively propagating the limited labels to the entire graph. We conducted extensive experiments on different types of datasets to demonstrate the superiority of CGPN.

PDF Details

AAAI Conference 2021 Conference Paper

Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing

Jun Yu
Hao Zhou
Yibing Zhan
Dacheng Tao

Unsupervised cross-modal hashing (UCMH) has become a hot topic recently. Current UCMH focuses on exploring data similarities. However, current UCMH methods calculate the similarity between two data, mainly relying on the two data’s cross-modal features. These methods suffer from inaccurate similarity problems that result in a suboptimal retrieval Hamming space, because the cross-modal features between the data are not sufficient to describe the complex data relationships, such as situations where two data have different feature representations but share the inherent concepts. In this paper, we devise a deep graph-neighbor coherence preserving network (DGCPN). Specifically, DGCPN stems from graph models and explores graph-neighbor coherence by consolidating the information between data and their neighbors. DGCPN regulates comprehensive similarity preserving losses by exploiting three types of data similarities (i. e. , the graph-neighbor coherence, the coexistent similarity, and the intra- and inter-modality consistency) and designs a half-real and half-binary optimization strategy to reduce the quantization errors during hashing. Essentially, DGCPN addresses the inaccurate similarity problem by exploring and exploiting the data’s intrinsic relationships in a graph. We conduct extensive experiments on three public UCMH datasets. The experimental results demonstrate the superiority of DGCPN, e. g. , by improving the mean average precision from 0. 722 to 0. 751 on MIRFlickr-25K using 64-bit hashing codes to retrieve texts from images. We will release the source code package and the trained model on https: //github. com/Atmegal/DGCPN.

PDF Details