Arrow Research search

Author name cluster

Bo Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

98 papers
2 author rows

Possible papers

98

TIST Journal 2026 Journal Article

Atom-Motif Contrastive Transformer for Molecular Property Prediction

  • Wentao Yu
  • Shuo Chen
  • Chen Gong
  • Bo Han
  • Gang Niu
  • Masashi Sugiyama

Recently, Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) due to their high reliability in characterizing the latent relationship among graph nodes (i.e., the atoms in a molecule). However, most existing GT-based methods usually explore the basic interactions between pairwise atoms, and thus they fail to consider the important interactions among critical motifs (e.g., functional groups consisted of several atoms) of molecules. As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e.g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP. To address this issue, we propose a novel Atom-Motif Contrastive Transformer (AMCT), which not only explores the atom-level interactions but also considers the motif-level interactions. Since the representations of atoms and motifs for a given molecule are actually two different views of the same instance, they are naturally aligned to generate the self-supervisory signals for model training. Meanwhile, the same motif can exist in different molecules, and hence we also employ the contrastive loss to maximize the representation agreement of identical motifs across different molecules. Finally, in order to clearly identify the motifs that are critical in deciding the properties of each molecule, we further construct a property-aware attention mechanism into our learning framework. Our proposed AMCT is extensively evaluated on 10 popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness when compared with the state-of-the-art methods.

AAAI Conference 2026 Conference Paper

DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning

  • Bo Han
  • Zhuoming Li
  • Xiaoyu Wang
  • Yaxin Hou
  • Hui Liu
  • Junhui Hou
  • Yuheng Jia

Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model’s performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.

TMLR Journal 2026 Journal Article

Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning

  • Ali Taheri
  • Alireza Taban
  • Qizhou Wang
  • Shanshan Ye
  • Abdolreza Mirzaei
  • Tongliang Liu
  • Bo Han

Supervised fine-tuning (SFT) plays a critical role for pretrained large language models (LLMs), notably enhancing their capacity to acquire domain-specific knowledge while preserving or potentially augmenting their general-purpose capabilities. However, the efficacy of SFT hinges on data quality as well as data volume, otherwise it may result in limited performance gains or even degradation relative to the associated baselines. To mitigate such reliance, we suggest categorizing tokens within each corpus into two parts---positive and negative tokens---based on whether they are useful to improve model performance. Positive tokens can be trained in common ways, whereas negative tokens, which may lack essential semantics or be misleading, should be explicitly forgotten. Overall, the token categorization facilitates the model to learn less informative messages, and the forgetting guides the model on what information to learn more precisely. We conduct experiments across diverse and well-established benchmarks using various model architectures, demonstrating that this forgetting mechanism enhances model performance.

AAAI Conference 2026 Conference Paper

Transferability of Adversarial Attacks in Video-based MLLMs: A Cross-modal Image-to-Video Approach

  • Linhao Huang
  • Xue Jiang
  • Zhiqiang Wang
  • Wentao Mo
  • Xi Xiao
  • Yong-Jie Yin
  • Bo Han
  • Feng Zheng

Video-based multimodal large language models (V-MLLMs) have shown vulnerability to adversarial examples in video-text multimodal tasks. However, the transferability of adversarial videos to unseen models—a common and practical real-world scenario—remains unexplored. In this paper, we pioneer an investigation into the transferability of adversarial video samples across V-MLLMs. We find that existing adversarial attack methods face significant limitations when applied in black-box settings for V-MLLMs, which we attribute to the following shortcomings: (1) lacking generalization in perturbing video features, (2) focusing only on sparse key-frames, and (3) failing to integrate multimodal information. To address these limitations and deepen the understanding of V-MLLM vulnerabilities in black-box scenarios, we introduce the Image-to-Video MLLM (I2V-MLLM) attack. In I2V-MLLM, we utilize an image-based multimodal large language model (I-MLLM) as a surrogate model to craft adversarial video samples. Multimodal interactions and spatiotemporal information are integrated to disrupt video representations within the latent space, improving adversarial transferability. Additionally, a perturbation propagation technique is introduced to handle different unknown frame sampling strategies. Experimental results demonstrate that our method can generate adversarial examples that exhibit strong transferability across different V-MLLMs on multiple video-text multimodal tasks. Compared to white-box attacks on these models, our black-box attacks (using BLIP-2 as a surrogate model) achieve competitive performance, with average attack success rate (AASR) of 57.98% on MSVD-QA and 58.26% on MSRVTT-QA for Zero-Shot VideoQA tasks, respectively.

NeurIPS Conference 2025 Conference Paper

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

  • Chenwang Wu
  • Yiu-ming Cheung
  • Bo Han
  • Defu Lian

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: \url{https: //github. com/tmlr-group/Easy2Hard}.

EAAI Journal 2025 Journal Article

Causality-inspired surface defect detection by transferring knowledge from natural images

  • Fangfang An
  • Shaolei Cao
  • Shuai Ma
  • Dawu Shu
  • Bo Han
  • Wanxin Li
  • Ruigang Liu

Deep neural networks have become a mainstream method for dyed fabric surface defect detection, which requires a substantial number of training images to exploit its performance. However, building large-scale datasets of dyed fabric surface defects is difficult due to the scarcity of defects, which leads to deep networks that are often under-trained and overfitted in the training set. In this paper, we attempt to mitigate model overfitting caused by insufficient defect samples by transferring the knowledge learned from natural images. Specifically, we explore the correlation between natural image salient object detection (SOD) and defect image detection, utilizing multi-domain learning to jointly train natural and defect images. To mitigate the biased behavior in model learning caused by the differences between natural and defect images, we design a causality-based removal framework to find the true causal relationships between defects and contexts. To facilitate integrity for multi-domain learning, we propose a diverse synergy module (DSM) based on foreground, background, and global feature fusion. The DSM is used to enhance the ability of the network to comprehensively perceive defect regions. In addition, we propose a multi-convolutional aggregation module (MCAM) to enhance feature diversity by aggregating features with different receptive fields. Extensive experiments on our proposed fabric dyeing defects dataset (FDD) and three of the most widely used surface defects datasets show that our proposed method achieves state-of-the-art results and has good generalization ability. Our method shows remarkable detection accuracy even with extremely limited training samples. Our dataset and code will be available at https: //github. com/DEF-21/UMDNet.

ICML Conference 2025 Conference Paper

COSDA: Counterfactual-based Susceptibility Risk Framework for Open-Set Domain Adaptation

  • Wenxu Wang
  • Rui Zhou
  • Jing Wang
  • Yun Zhou
  • Cheng Zhu
  • Ruichun Tang
  • Bo Han
  • Nevin L. Zhang

Open-Set Domain Adaptation (OSDA) aims to transfer knowledge from the labeled source domain to the unlabeled target domain that contains unknown categories, thus facing the challenges of domain shift and unknown category recognition. While recent works have demonstrated the potential of causality for domain alignment, little exploration has been conducted on causal-inspired theoretical frameworks for OSDA. To fill this gap, we introduce the concept of Susceptibility and propose a novel C ounterfactual-based susceptibility risk framework for OSDA, termed COSDA. Specifically, COSDA consists of three novel components: (i) a Susceptibility Risk Estimator (SRE) for capturing causal information, along with comprehensive derivations of the computable theoretical upper bound, forming a risk minimization framework under the OSDA paradigm; (ii) a Contrastive Feature Alignment (CFA) module, which is theoretically proven based on mutual information to satisfy the Exogeneity assumption and facilitate cross-domain feature alignment; (iii) a Virtual Multi-unknown-categories Prototype (VMP) pseudo-labeling strategy, providing label information by measuring how similar samples are to known and multiple virtual unknown category prototypes, thereby assisting in open-set recognition and intra-class discriminative feature learning. Extensive experiments demonstrate that our approach achieves state-of-the-art performance.

NeurIPS Conference 2025 Conference Paper

Detecting Generated Images by Fitting Natural Image Distributions

  • Yonggang Zhang
  • Jun Nie
  • Xinmei Tian
  • Mingming Gong
  • Kun Zhang
  • Bo Han

The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of natural and generated images. To exploit this difference, we employ a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a simple yet effective detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. Further more, to address diminishing manifold disparities in advanced generative models, we leverage normalizing flows to amplify detectable differences by extruding generated images away from the natural image manifold. Extensive experiments demonstrate the efficacy of this method.

AAAI Conference 2025 Conference Paper

Eliciting Causal Abilities in Large Language Models for Reasoning Tasks

  • Yajing Wang
  • Zongwei Luo
  • Jingzhe Wang
  • Zhanke Zhou
  • Yongqiang Chen
  • Bo Han

Prompt optimization automatically refines prompting expressions, unlocking the full potential of LLMs in downstream tasks. However, current prompt optimization methods are costly to train and lack sufficient interpretability. This paper proposes enhancing LLMs' reasoning performance by eliciting their causal inference ability from prompting instructions to correct answers. Specifically, we introduce the Self-Causal Instruction Enhancement (SCIE) method, which enables LLMs to generate high-quality, low-quantity observational data, then estimates the causal effect based on these data, and ultimately generates instructions with the optimized causal effect. In SCIE, the instructions are treated as the treatment, and textual features are used to process natural language, establishing causal relationships through treatments between instructions and downstream tasks. Additionally, we propose applying Object-Relational (OR) principles, where the uncovered causal relationships are treated as the inheritable class across task objects, ensuring low-cost reusability. Extensive experiments demonstrate that our method effectively generates instructions that enhance reasoning performance with reduced training cost of prompts, leveraging interpretable textual features to provide actionable insights.

NeurIPS Conference 2025 Conference Paper

Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples

  • Suqin Yuan
  • Lei Feng
  • Bo Han
  • Tongliang Liu

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.

NeurIPS Conference 2025 Conference Paper

Epistemic Uncertainty for Generated Image Detection

  • Jun Nie
  • Yonggang Zhang
  • Tongliang Liu
  • Yiu-ming Cheung
  • Bo Han
  • Xinmei Tian

We introduce a novel framework for AI-generated image detection through epistemic uncertainty, aiming to address critical security concerns in the era of generative models. Our key insight stems from the observation that distributional discrepancies between training and testing data manifest distinctively in the epistemic uncertainty space of machine learning models. In this context, the distribution shift between natural and generated images leads to elevated epistemic uncertainty in models trained on natural images when evaluating generated ones. Hence, we exploit this phenomenon by using epistemic uncertainty as a proxy for detecting generated images. This converts the challenge of generated image detection into the problem of uncertainty estimation, underscoring the generalization performance of the model used for uncertainty estimation. Fortunately, advanced large-scale vision models pre-trained on extensive natural images have shown excellent generalization performance for various scenarios. Thus, we utilize these pre-trained models to estimate the epistemic uncertainty of images and flag those with high uncertainty as generated. Extensive experiments demonstrate the efficacy of our method.

TMLR Journal 2025 Journal Article

Federated Generalized Novel Category Discovery with Prompts Tuning

  • Lei Shen
  • Nan Pu
  • Zhun Zhong
  • Mingming Gong
  • Dianhai Yu
  • Chengqi Zhang
  • Bo Han

Generalized category discovery (GCD) is proposed to handle categories from unseen labels during the inference stage by clustering them. Most works in GCD provide solutions for unseen classes in data-centralized settings. However, unlabeled categories possessed by clients, which are common in real-world federated learning (FL), have been largely ignored and degraded the performance of classic FL algorithms. To demonstrate and mitigate the harmful effect of unseen classes, we dive into a GCD problem setting applicable for FL named FedGCD, analyze overfitting problem in FedGCD in detail, establish a strong baseline constructed with state-of-the-art GCD algorithm simGCD, and design a learning framework with prompt tuning to tackle both the overfitting and communication burden problems in FedGCD. In our methods, clients first separately carry out prompt learning on local data. Then, we aggregate the prompts from all clients as the global prompt to help capture global knowledge and then send the global prompts to local clients to allow access to broader knowledge from other clients. By this method, we significantly reduce the parameters needed to upload in FedGCD, which is a common obstacle in the real application of most FL algorithms. We conduct experiments on both generic and fine-grained datasets like CIFAR-100 and CUB-200, and show that our method is comparable to the FL version of simGCD and surpasses other baselines with significantly fewer parameters to transmit.

NeurIPS Conference 2025 Conference Paper

FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning

  • Zhiqin Yang
  • Yonggang Zhang
  • Chenxin Li
  • Yiu-ming Cheung
  • Bo Han
  • Yixuan Yuan

Federated Learning (FL) confronts a significant challenge known as data heterogeneity, which impairs model performance and convergence. Existing methods have made notable progress in addressing this issue. However, improving performance in certain heterogeneity scenarios remains an overlooked question: How robust are these methods to deploy under diverse heterogeneity scenarios? To answer this, we conduct comprehensive evaluations across varied heterogeneity scenarios, showing that most existing methods exhibit limited robustness. Meanwhile, insights from these experiments highlight that sharing statistical information can mitigate heterogeneity by enabling clients to update with a global perspective. Motivated by this, we propose FedGPS ( Fed erated G oal- P ath S ynergy), a novel framework that seamlessly integrates statistical distribution and gradient information from others. Specifically, FedGPS statically modifies each client’s learning objective to implicitly model the global data distribution using surrogate information, while dynamically adjusting local update directions with gradient information from other clients at each round. Extensive experiments show that FedGPS outperforms state-of-the-art methods across diverse heterogeneity scenarios, validating its effectiveness and robustness. The code is available at: .

NeurIPS Conference 2025 Conference Paper

Generative Model Inversion Through the Lens of the Manifold Hypothesis

  • Xiong Peng
  • Bo Han
  • Fengfei Yu
  • Tongliang Liu
  • Feng Liu
  • Mingyuan Zhou

Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models. Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process, yielding reconstructions with high visual quality and strong fidelity to the private data. To explore the reason behind their effectiveness, we begin by examining the gradients of inversion loss w. r. t. synthetic inputs, and find that these gradients are surprisingly noisy. Further analysis shows that generative model inversion approaches implicitly denoise the gradients by projecting them onto the tangent space of the generator manifold—filtering out directions that deviate from the manifold structure while preserving informative components aligned with it. Our empirical measurements show that, in models trained with standard supervision, loss gradients exhibit large angular deviations from the data manifold, indicating poor alignment with class-relevant directions. This observation motivates our central hypothesis: models become more vulnerable to MIAs when their loss gradients align more closely with the generator manifold. We validate this hypothesis by designing a novel training objective that explicitly promotes such alignment. Building on this insight, we further introduce a training-free approach to enhance gradient–manifold alignment during inversion, leading to consistent improvements over state-of-the-art generative MIAs.

NeurIPS Conference 2025 Conference Paper

Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

  • Yaxin Hou
  • Bo Han
  • Yuheng Jia
  • Hui Liu
  • Junhui Hou

Current long-tailed semi-supervised learning methods assume that labeled data exhibit a long-tailed distribution, and unlabeled data adhere to a typical predefined distribution (i. e. , long-tailed, uniform, or inverse long-tailed). However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. Specifically, CPG operates through a controllable self-reinforcing optimization cycle: (i) at each training step, our dynamic controllable filtering mechanism selectively incorporates reliable pseudo-labels from the unlabeled dataset into the labeled dataset, ensuring that the updated labeled dataset follows a known distribution; (ii) we then construct a Bayes-optimal classifier using logit adjustment based on the updated labeled data distribution; (iii) this improved classifier subsequently helps identify more reliable pseudo-labels in the next training step. We further theoretically prove that this optimization cycle can significantly reduce the generalization error under some conditions. Additionally, we propose a class-aware adaptive augmentation module to further improve the representation of minority classes, and an auxiliary branch to maximize data utilization by leveraging all labeled and unlabeled samples. Comprehensive evaluations on various commonly used benchmark datasets show that CPG achieves consistent improvements, surpassing state-of-the-art methods by up to 15. 97\% in accuracy. The code is available at https: //github. com/yaxinhou/CPG.

NeurIPS Conference 2025 Conference Paper

Learning to Instruct for Visual Instruction Tuning

  • Zhihan Zhou
  • Feng Hong
  • JIAAN LUO
  • Yushi Ye
  • Jiangchao Yao
  • Dongsheng Li
  • Bo Han
  • Ya Zhang

We propose L2T, an advancement of visual instruction tuning (VIT). While VIT equips Multimodal LLMs (MLLMs) with promising multimodal capabilities, the current design choices for VIT often result in overfitting and shortcut learning, potentially degrading performance. This gap arises from an overemphasis on instruction-following abilities, while neglecting the proactive understanding of visual information. Inspired by this, L2T adopts a simple yet effective approach by incorporating the loss function into both the instruction and response sequences. It seamlessly expands the training data, and regularizes the MLLMs from overly relying on language priors. Based on this merit, L2T achieves a significant relative improvement of up to 9% on comprehensive multimodal benchmarks, requiring no additional training data and incurring negligible computational overhead. Surprisingly, L2T attains exceptional fundamental visual capabilities, yielding up to an 18% improvement in captioning performance, while simultaneously alleviating hallucination in MLLMs. Github code: https: //github. com/Feng-Hong/L2T.

IJCAI Conference 2025 Conference Paper

One-shot Federated Learning Methods: A Practical Guide

  • Xiang Liu
  • Zhenheng Tang
  • Xia Li
  • Yijun Song
  • Sijie Ji
  • Zemin Liu
  • Bo Han
  • Linshan Jiang

One-shot Federated Learning (OFL) is a distributed machine learning paradigm that constrains client-server communication to a single round, addressing privacy and communication overhead issues associated with multiple rounds of data exchange in traditional Federated Learning (FL). OFL demonstrates the practical potential for integration with future approaches that require collaborative training models, such as large language models (LLMs). However, current OFL methods face two major challenges: data heterogeneity and model heterogeneity, which result in subpar performance compared to conventional FL methods. Worse still, despite numerous studies addressing these limitations, a comprehensive summary is still lacking. To address these gaps, this paper presents a systematic analysis of the challenges faced by OFL and thoroughly reviews the current methods. We also offer an innovative categorization method and analyze the trade-offs of various techniques. Additionally, we discuss the most promising future directions and the technologies that should be integrated into the OFL field. This work aims to provide guidance and insights for future research.

NeurIPS Conference 2025 Conference Paper

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

  • Shuhai Zhang
  • ZiHao Lian
  • Jiahao Yang
  • Daiyuan Li
  • Guoxuan Pang
  • Feng Liu
  • Bo Han
  • Shutao Li

AI-generated videos have achieved near-perfect visual realism (e. g. , Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose a physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16. 00\% in Recall and 10. 75\% in F1-Score, validating the superior performance of NSG-VD. The source code is available at \url{https: //github. com/ZSHsh98/NSG-VD}.

NeurIPS Conference 2025 Conference Paper

Practical Kernel Selection for Kernel-based Conditional Independence Test

  • Wenjie Wang
  • Mingming Gong
  • Biwei Huang
  • James Bailey
  • Bo Han
  • Kun Zhang
  • Feng Liu

Conditional independence (CI) testing is a fundamental yet challenging task in modern statistics and machine learning. One pivotal class of methods for assessing conditional independence encompasses kernel-based approaches, known for assessing CI by detecting general conditional dependence without imposing strict assumptions on relationships or data distributions. As with any method utilizing kernels, selecting appropriate kernels is crucial for precise identification. However, it remains underexplored in kernel-based CI methods, where the kernels are often determined manually or heuristically. In this paper, we analyze and propose a kernel parameter selection approach for the kernel-based conditional independence test (KCI). The kernel parameters are selected based on the ratio of the statistic to the asymptotic variance, which approximates the test power for the given parameters at large sample sizes. The search procedure is grid-based, allowing for parallelization with manageable additional computation time. We theoretically demonstrate the consistency of the proposed criterion and conduct extensive experiments on both synthetic and real data to show the effectiveness of our method.

AAAI Conference 2025 Conference Paper

Provable Discriminative Hyperspherical Embedding for Out-of-Distribution Detection

  • Zhipeng Zou
  • Sheng Wan
  • Guangyu Li
  • Bo Han
  • Tongliang Liu
  • Lin Zhao
  • Chen Gong

Out-of-distribution (OOD) detection aims to identify the test examples that do not belong to the distribution of training data. The distance-based methods, which identify OOD examples based on their distances from the centroids of in-distribution (ID) examples, have demonstrated promising OOD detection performance. However, the objectives utilized in prior approaches are typically designed for classification and thus might not yield sufficient discriminative power to distinguish between ID and OOD examples. Therefore, this paper proposes a prototype-based contrastive learning framework for OOD detection, which is termed provable Discriminative Hyperspherical Embedding (DHE). The proposed framework provides a theoretical analysis of inter-class dispersion, which is proved to be fundamental in reducing the false positive rate (FPR) on OOD examples. Based on this, we devise an angular spread loss to achieve the maximal dispersion of the prototypes of different classes prior to training. Subsequently, a prototype-enhanced contrastive loss is introduced to align embeddings of ID examples closely with their corresponding prototypes. In our proposed DHE, the maximal prototype dispersion is theoretically proved, thereby avoiding the pitfalls of local optima commonly encountered by most existing methods. Experimental results demonstrate the effectiveness of our proposed DHE, which showcases a remarkable reduction in FPR95 (i.e., 5.37% on CIFAR-100) and more than doubling the computational efficiency when compared with the state-of-the-art methods.

TIST Journal 2025 Journal Article

Robust Learning under Hybrid Noise

  • Yang Wei
  • Shuo Chen
  • Shanshan Ye
  • Bo Han
  • Chen Gong

Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collection and annotation processes. Although some results have been achieved by a few representation learning based attempts, this issue is still far from being addressed with promising performance and guaranteed theoretical analyses. To address the challenge, we propose a novel unified learning framework called Feature and Label Recovery (FLR) to combat the hybrid noise from the perspective of data recovery, where we concurrently reconstruct both the feature matrix and the label matrix of input data. Specifically, the clean feature matrix is discovered by the low-rank approximation, and the ground-truth label matrix is embedded based on the recovered features with a nuclear norm regularization. Meanwhile, the feature noise and label noise are characterized by their respective adaptive matrix norms to satisfy the corresponding maximum likelihood. As this framework leads to a non-convex optimization problem, we develop the non-convex Alternating Direction Method of Multipliers (ADMM) with the convergence guarantee to solve our learning objective. We also provide the theoretical analysis to show that the generalization error of FLR can be upper-bounded in the presence of hybrid noise. Experimental results on several typical benchmark datasets clearly demonstrate the superiority of our proposed method over the state-of-the-art robust learning approaches for various noises.

IJCAI Conference 2025 Conference Paper

Towards Regularized Mixture of Predictions for Class-Imbalanced Semi-Supervised Facial Expression Recognition

  • Hangyu Li
  • Yixin Zhang
  • Jiangchao Yao
  • Nannan Wang
  • Bo Han

Semi-supervised facial expression recognition (SSFER) effectively assigns pseudo-labels to confident unlabeled samples when only limited emotional annotations are available. Existing SSFER methods are typically built upon an assumption of the class-balanced distribution. However, they are far from real-world applications due to biased pseudo-labels caused by class imbalance. To alleviate this issue, we propose Regularized Mixture of Predictions (ReMoP), a simple yet effective method to generate high-quality pseudo-labels for imbalanced samples. Specifically, we first integrate feature similarity into the linear prediction to learn a mixture of predictions. Furthermore, we introduce a class regularization term that constrains the feature geometry to mitigate imbalance bias. Being practically simple, our method can be integrated with existing semi-supervised learning and SSFER methods to tackle the challenge associated with class-imbalanced SSFER effectively. Extensive experiments on four facial expression datasets demonstrate the effectiveness of the proposed method across various imbalanced conditions. The source code is made publicly available at https: //github. com/hangyu94/ReMoP.

IS Journal 2025 Journal Article

Trustworthy Machine Learning in the Era of Foundation Models

  • Bo Han

This position article examines trustworthy machine learning with foundation models by investigating the four essential aspects. In learning, we describe how pretraining, fine-tuning, and reinforcement learning enable models to acquire generalizable knowledge and emphasize the importance of high-quality, unbiased training data, as well as robust training methods. In reasoning, we summarize the fundamental methodology of training-free, post-training, and test-time scaling methods that enhance logical deduction, reasoning transparency, and systematic safety. In planning, we incorporate neurosymbolic methods that combine adaptable neural capabilities with formally verifiable symbolic reasoning, ensuring safe and accountable decision making. In multimodality, we investigate the need for multimodal integration, where aligning information from different sensory input sources is important to mitigate biases and errors. The article presents an interdisciplinary vision for incorporating capability, robustness, safety, and explainability to establish trustworthy foundation models, paving the way for their reliable deployment in real-world applications, such as financial and clinical decision making.

NeurIPS Conference 2025 Conference Paper

Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data

  • shu chen
  • HongJun Xu
  • Ruichi Zhang
  • Mengke Li
  • Yonggang Zhang
  • Yang Lu
  • Bo Han
  • Yiu-ming Cheung

In real world, the observed label distribution of a dataset often mismatches its true distribution due to noisy labels. In this situation, noisy labels learning (NLL) methods directly integrated with long-tail learning (LTL) methods tend to fail due to a dilemma: NLL methods normally rely on unbiased model predictions to recover true distribution by selecting and correcting noisy labels; while LTL methods like logit adjustment depends on true distributions to adjust biased predictions, leading to a deadlock of mutual dependency defined in this paper. To address this, we propose \texttt{Unlocker}, a bilevel optimization framework that integrates NLL methods and LTL methods to iteratively disentangle this deadlock. The inner optimization leverages NLL to train the model, incorporating LTL methods to fairly select and correct noisy labels. The outer optimization adaptively determines an adjustment strength, mitigating model bias from over- or under-adjustment. We also theoretically prove that this bilevel optimization problem is convergent by transferring the outer optimization target to an equivalent problem with a closed-form solution. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our method in alleviating model bias and handling long-tailed noisy label data. Code is available at \url{https: //anonymous. 4open. science/r/neurips-2025-anonymous-1015/}.

NeurIPS Conference 2024 Conference Paper

A Sober Look at the Robustness of CLIPs to Spurious Features

  • Qizhou Wang
  • Yong Lin
  • Yongqiang Chen
  • Ludwig Schmidt
  • Bo Han
  • Tong Zhang

Large vision language models, such as CLIP, demonstrate impressive robustness to spurious features than single-modal models trained on ImageNet. However, existing test datasets are typically curated based on ImageNet-trained models, which aim to capture the spurious features inherited in ImageNet. Benchmarking CLIP models based on the ImageNet-oriented spurious features may not be sufficient to reflect the extent to which CLIP models are robust to spurious correlations within CLIP training data, e. g. , LAION. To this end, we craft a new challenging dataset named CounterAnimal designed to reveal the reliance of CLIP models on realistic spurious features. Specifically, we split animal photos into groups according to the backgrounds, and then identify a pair of groups for each class where a CLIP model shows high-performance drops across the two groups. Our evaluations show that the spurious features captured by CounterAnimal are generically learned by CLIP models with different backbones and pre-train data, yet have limited influence for ImageNet models. We provide theoretical insights that the CLIP objective cannot offer additional robustness. Furthermore, we also re-evaluate strategies such as scaling up parameters and high-quality pre-trained data. We find that they still help mitigate the spurious features, providing a promising path for future developments.

AAAI Conference 2024 Conference Paper

AMD: Autoregressive Motion Diffusion

  • Bo Han
  • Hao Peng
  • Minjing Dong
  • Yi Ren
  • Yixuan Shen
  • Chang Xu

Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with “No Modality Left Behind”, enabling for the first time the generation of high-definition and high-fidelity human motions based on user-defined modality input.

NeurIPS Conference 2024 Conference Paper

Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?

  • Zhanke Zhou
  • Rong Tao
  • Jianing Zhu
  • Yiwen Luo
  • Zengmao Wang
  • Bo Han

This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales, which include irrelevant or inaccurate reasoning thoughts within examples used for in-context learning. We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales. Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy. Notably, compared to prompting with clean rationales, base LLM drops by 1. 4%-19. 8% in accuracy with irrelevant thoughts and more drastically by 2. 2%-40. 4% with inaccurate thoughts. Addressing this challenge necessitates external supervision that should be accessible in practice. Here, we propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT). It enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting. This method follows a principle of exploration and exploitation: (1) rephrasing and selecting rationales in the input space to achieve explicit denoising and (2) exploring diverse reasoning paths and voting on answers in the output space. Empirically, CD-CoT demonstrates an average improvement of 17. 8% in accuracy over the base model and shows significantly stronger denoising capabilities than baseline methods. The source code is publicly available at: https: //github. com/tmlr-group/NoisyRationales.

NeurIPS Conference 2024 Conference Paper

Discovery of the Hidden World with Large Language Models

  • Chenxi Liu
  • Yongqiang Chen
  • Tongliang Liu
  • Mingming Gong
  • James Cheng
  • Bo Han
  • Kun Zhang

Revealing the underlying causal mechanisms in the real world is the key to the development of science. Despite the progress in the past decades, traditional causal discovery approaches (CDs) mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. The lack of well-defined high-level variables in many real-world applications has already been a longstanding roadblock to a broader application of CDs. To this end, this paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap. LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data. Therefore, it is natural to employ LLMs to assist with proposing useful high-level factors and crafting their measurements. Meanwhile, COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors. We show that LLMs and CDs are mutually beneficial and the constructed feedback provably also helps with the factor proposal. We construct and curate several synthetic and real-world benchmarks including analysis of human reviews and diagnosis of neuropathic and brain tumors, to comprehensively evaluate COAT. Extensive empirical results confirm the effectiveness and reliability of COAT with significant improvements.

AAAI Conference 2024 Conference Paper

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

  • Binghui Xie
  • Yongqiang Chen
  • Jiaqi Wang
  • Kaiwen Zhou
  • Bo Han
  • Wei Meng
  • James Cheng

Domain generalization is a critical challenge for machine learning systems. Prior domain generalization methods focus on extracting domain-invariant features across several stationary domains to enable generalization to new domains. However, in non-stationary tasks where new domains evolve in an underlying continuous structure, such as time, merely extracting the invariant features is insufficient for generalization to the evolving new domains. Nevertheless, it is non-trivial to learn both evolving and invariant features within a single model due to their conflicts. To bridge this gap, we build causal models to characterize the distribution shifts concerning the two patterns, and propose to learn both dynamic and invariant features via a new framework called Mutual Information-Based Sequential Autoencoders (MISTS). MISTS adopts information theoretic constraints onto sequential autoencoders to disentangle the dynamic and invariant features, and leverage an adaptive classifier to make predictions based on both evolving and invariant information. Our experimental results on both synthetic and real-world datasets demonstrate that MISTS succeeds in capturing both evolving and invariant information, and present promising results in evolving domain generalization tasks.

TMLR Journal 2024 Journal Article

Exploit CAM by itself: Complementary Learning System for Weakly Supervised Semantic Segmentation

  • Wankou Yang
  • Jiren Mai
  • Fei Zhang
  • Tongliang Liu
  • Bo Han

Weakly Supervised Semantic Segmentation (WSSS) with image-level labels has long been suffering from fragmentary object regions led by Class Activation Map (CAM), which is incapable of generating fine-grained masks for semantic segmentation. To guide CAM to find more non-discriminating object patterns, this paper turns to an interesting working mechanism in agent learning named Complementary Learning System (CLS). CLS holds that the neocortex builds a sensation of general knowledge, while the hippocampus specially learns specific details, completing the learned patterns. Motivated by this simple but effective learning pattern, we propose a General-Specific Learning Mechanism (GSLM) to explicitly drive a coarse-grained CAM to a fine-grained pseudo mask. Specifically, GSLM develops a General Learning Module (GLM) and a Specific Learning Module (SLM). The GLM is trained with image-level supervision to extract coarse and general localization representations from CAM. Based on the general knowledge in the GLM, the SLM progressively exploits the specific spatial knowledge from the localization representations, expanding the CAM in an explicit way. To this end, we propose the Seed Reactivation to help SLM reactivate non-discriminating regions by setting a boundary for activation values, which successively identifies more regions of CAM. Without extra refinement processes, our method is able to achieve improvements for CAM of over 20.0% mIoU on PASCAL VOC 2012 and 10.0% mIoU on MS COCO 2014 datasets, representing a new state-of-the-art among existing WSSS methods. The code is publicly available at: https://github.com/tmlr-group/GSLM.

AAAI Conference 2024 Conference Paper

Federated Learning with Extremely Noisy Clients via Negative Distillation

  • Yang Lu
  • Lin Chen
  • Yonggang Zhang
  • Yiliang Zhang
  • Bo Han
  • Yiu-ming Cheung
  • Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., >90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a ‘bad teacher’ in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

Few-Shot Adversarial Prompt Learning on Vision-Language Models

  • Yiwei Zhou
  • Xiaobo Xia
  • Zhiwei Lin
  • Bo Han
  • Tongliang Liu

The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adaptation cost, suboptimal text supervision, and uncontrolled natural generalization capacity. In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. Specifically, we achieve this by providing adversarially correlated text supervision that is end-to-end learned from adversarial examples. We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. The proposed framework gives access to learn adversarial text supervision, which provides superior cross-modal adversarial alignment and matches state-of-the-art zero-shot adversarial robustness with only 1\% training data. Code is available at: https: //github. com/lionel-w2/FAP.

NeurIPS Conference 2024 Conference Paper

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

  • Zhenheng Tang
  • Yonggang Zhang
  • Peijie Dong
  • Yiu-ming Cheung
  • Amelie C. Zhou
  • Bo Han
  • Xiaowen Chu

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

TMLR Journal 2024 Journal Article

HiFE: Hierarchical Feature Ensemble Framework for Few-shot Hypotheses Adaptation

  • Yongfeng Zhong
  • Haoang Chi
  • Feng Liu
  • Xiao-ming Wu
  • Bo Han

The process of transferring knowledge from a source domain to a target domain in the absence of source data constitutes a formidable obstacle within the field of source-free domain adaptation, often termed hypothesis adaptation. Conventional methodologies have been dependent on a robustly trained (strong) source hypothesis to encapsulate the knowledge pertinent to the source domain. However, this strong hypothesis is prone to overfitting the source domain, resulting in diminished generalization performance when applied to the target domain. To mitigate this issue, we advocate for the augmentation of transferable source knowledge via the integration of multiple (weak) source models that are underfitting. Furthermore, we propose a novel architectural framework, designated as the Hierarchical Feature Ensemble (HiFE) framework for Few-Shot Hypotheses Adaptation, which amalgamates features from both the strong and intentionally underfit source models. Empirical evidence from our experiments indicates that these weaker models, while not optimal within the source domain context, contribute to an enhanced generalization capacity of the resultant model for the target domain. Moreover, the HiFE framework we introduce demonstrates superior performance, surpassing other leading baselines across a spectrum of few-shot hypothesis adaptation scenarios.

IJCAI Conference 2024 Conference Paper

MCM: Multi-condition Motion Synthesis Framework

  • Zeyu Ling
  • Bo Han
  • Yongkang Wong
  • Han Lin
  • Mohan Kankanhalli
  • Weidong Geng

Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptly apprehends the spatial intricacies and inter-joint correlations inherent in motion sequences, facilitated by the integration of multi-wise self-attention modules. Extensive experiments show that our method achieves competitive results in single-condition and multi-condition HMS tasks.

NeurIPS Conference 2024 Conference Paper

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

  • Hongduan Tian
  • Feng Liu
  • Zhanke Zhou
  • Tongliang Liu
  • Chengqi Zhang
  • Bo Han

In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representation distributions and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve the minimum validation loss at the enlarged gap.

JMLR Journal 2024 Journal Article

On the Learnability of Out-of-distribution Detection

  • Zhen Fang
  • Yixuan Li
  • Feng Liu
  • Bo Han
  • Jie Lu

Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms, and corresponding learning theory is still an open problem. To study the generalization of OOD detection, this paper investigates the probably approximately correct (PAC) learning theory of OOD detection that fits the commonly used evaluation metrics in the literature. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we offer theoretical support for representative OOD detection works based on our OOD theory. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2024 Conference Paper

ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning

  • Jingcai Guo
  • Qihua Zhou
  • Xiaocheng Lu
  • Ruibin Li
  • Ziming Liu
  • Jie Zhang
  • Bo Han
  • Junyang Chen

This paper provides a novel parsimonious yet efficient design for zero-shot learning (ZSL), dubbed ParsNets, in which we are interested in learning a composition of on-device friendly linear networks, each with orthogonality and low-rankness properties, to achieve equivalent or better performance against deep models. Concretely, we first refactor the core module of ZSL, i. e. , the visual-semantics mapping function, into several base linear networks that correspond to diverse components of the semantic space, wherein the complex nonlinearity can be collapsed into simple local linearities. Then, to facilitate the generalization of local linearities, we construct a maximal margin geometry on the learned features by enforcing low-rank constraints on intra-class samples and high-rank constraints on inter-class samples, resulting in orthogonal subspaces for different classes. To enhance the model's adaptability and counterbalance the over-/under-fittings, a set of sample-wise indicators is employed to select a sparse subset from these base linear networks to form a composite semantic predictor for each sample. Notably, maximal margin geometry can guarantee the diversity of features and, meanwhile, local linearities guarantee efficiency. Thus, our ParsNets can generalize better to unseen classes and can be deployed flexibly on resource-constrained devices.

NeurIPS Conference 2024 Conference Paper

Pseudo-Private Data Guided Model Inversion Attacks

  • Xiong Peng
  • Bo Han
  • Feng Liu
  • Tongliang Liu
  • Mingyuan Zhou

In model inversion attacks (MIAs), adversaries attempt to recover private training data by exploiting access to a well-trained target model. Recent advancements have improved MIA performance using a two-stage generative framework. This approach first employs a generative adversarial network to learn a fixed distributional prior, which is then used to guide the inversion process during the attack. However, in this paper, we observed a phenomenon that such a fixed prior would lead to a low probability of sampling actual private data during the inversion process due to the inherent distribution gap between the prior distribution and the private data distribution, thereby constraining attack performance. To address this limitation, we propose increasing the density around high-quality pseudo-private data—recovered samples through model inversion that exhibit characteristics of the private training data—by slightly tuning the generator. This strategy effectively increases the probability of sampling actual private data that is close to these pseudo-private data during the inversion process. After integrating our method, the generative model inversion pipeline is strengthened, leading to improvements over state-of-the-art MIAs. This paves the way for new research directions in generative MIAs.

NeurIPS Conference 2024 Conference Paper

Revive Re-weighting in Imbalanced Learning by Density Ratio Estimation

  • JIAAN LUO
  • Feng Hong
  • Jiangchao Yao
  • Bo Han
  • Ya Zhang
  • Yanfeng Wang

In deep learning, model performance often deteriorates when trained on highly imbalanced datasets, especially when evaluation metrics require robust generalization across underrepresented classes. To address the challenges posed by imbalanced data distributions, this study introduces a novel method utilizing density ratio estimation for dynamic class weight adjustment, termed as Re-weighting with Density Ratio (RDR). Our method adaptively adjusts the importance of each class during training, mitigates overfitting on dominant classes and enhances model adaptability across diverse datasets. Extensive experiments conducted on various large scale benchmark datasets validate the effectiveness of our method. Results demonstrate substantial improvements in generalization capabilities, particularly under severely imbalanced conditions.

NeurIPS Conference 2024 Conference Paper

Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection

  • Geng Yu
  • Jianing Zhu
  • Jiangchao Yao
  • Bo Han

Out-of-distribution (OOD) detection is crucial for deploying reliable machine learning models in open-world applications. Recent advances in CLIP-based OOD detection have shown promising results via regularizing prompt tuning with OOD features extracted from ID data. However, the irrelevant context mined from ID data can be spurious due to the inaccurate foreground-background decomposition, thus limiting the OOD detection performance. In this work, we propose a novel framework, namely, \textit{Self-Calibrated Tuning (SCT)}, to mitigate this problem for effective OOD detection with only the given few-shot ID data. Specifically, SCT introduces modulating factors respectively on the two components of the original learning objective. It adaptively directs the optimization process between the two tasks during training on data with different prediction uncertainty to calibrate the influence of OOD regularization, which is compatible with many prompt tuning based OOD detection methods. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed SCT. The code is publicly available at: https: //github. com/tmlr-group/SCT.

IJCAI Conference 2024 Conference Paper

Trustworthy Machine Learning under Imperfect Data

  • Bo Han

Trustworthy machine learning (TML) under imperfect data has recently brought much attention in the data-centric fields of machine learning (ML) and artificial intelligence (AI). Specifically, there are mainly three types of imperfect data along with their challenges for ML, including i) label-level imperfection: noisy labels; ii) feature-level imperfection: adversarial examples; iii) distribution-level imperfection: out-of-distribution data. Therefore, in this paper, we systematically share our insights and solutions of TML to handle three types of imperfect data. More importantly, we discuss some new challenges in TML, which also open more opportunities for future studies, such as trustworthy foundation models, trustworthy federated learning, and trustworthy causal learning.

TMLR Journal 2024 Journal Article

Understanding Fairness Surrogate Functions in Algorithmic Fairness

  • Wei Yao
  • Zhanke Zhou
  • Zhicong Li
  • Bo Han
  • Yong Liu

It has been observed that machine learning algorithms exhibit biased predictions against certain population groups. To mitigate such bias while achieving comparable accuracy, a promising approach is to introduce surrogate functions of the concerned fairness definition and solve a constrained optimization problem. However, it is intriguing in previous work that such fairness surrogate functions may yield unfair results and high instability. In this work, in order to deeply understand them, taking a widely used fairness definition—demographic parity as an example, we show that there is a surrogate-fairness gap between the fairness definition and the fairness surrogate function. Also, the theoretical analysis and experimental results about the “gap” motivate us that the fairness and stability will be affected by the points far from the decision boundary, which is the large margin points issue investigated in this paper. To address it, we propose the general sigmoid surrogate to simultaneously reduce both the surrogate-fairness gap and the variance, and offer a rigorous fairness and stability upper bound. Interestingly, the theory also provides insights into two important issues that deal with the large margin points as well as obtaining a more balanced dataset are beneficial to fairness and stability. Furthermore, we elaborate a novel and general algorithm called Balanced Surrogate, which iteratively reduces the “gap” to mitigate unfairness. Finally, we provide empirical evidence showing that our methods consistently improve fairness and stability while maintaining accuracy comparable to the baselines in three real-world datasets.

NeurIPS Conference 2024 Conference Paper

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

  • Haoang Chi
  • He Li
  • Wenjing Yang
  • Feng Liu
  • Long Lan
  • Xiaoguang Ren
  • Tongliang Liu
  • Bo Han

Causal reasoning capability is critical in advancing large language models (LLMs) towards artificial general intelligence (AGI). While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark named CausalProbe 2024, whose corpus is fresh and nearly unseen for the studied LLMs. Empirical results show a significant performance drop on CausalProbe 2024 compared to earlier benchmarks, indicating that LLMs primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. Inspired by this, we propose G$^2$-Reasoner, a LLM causal reasoning method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes. Experiments demonstrate that G$^2$-Reasoner significantly enhances LLMs' causal reasoning capability, particularly in fresh and fictitious contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.

JAIR Journal 2024 Journal Article

USN: A Robust Imitation Learning Method against Diverse Action Noise

  • Xingrui Yu
  • Bo Han
  • Ivor W. Tsang

Learning from imperfect demonstrations is a crucial challenge in imitation learning (IL). Unlike existing works that still rely on the enormous effort of expert demonstrators, we consider a more cost-effective option for obtaining a large number of demonstrations. That is, hire annotators to label actions for existing image records in realistic scenarios. However, action noise can occur when annotators are not domain experts or encounter confusing states. In this work, we introduce two particular forms of action noise, i.e., state-independent and state-dependent action noise. Previous IL methods fail to achieve expert-level performance when the demonstrations contain action noise, especially the state-dependent action noise. To mitigate the harmful effects of action noises, we propose a robust learning paradigm called USN (Uncertainty-aware Sample-selection with Negative learning). The model first estimates the predictive uncertainty for all demonstration data and then selects sampleswith high loss based on the uncertainty measures. Finally, it updates the model parameters with additional negative learning on the selected samples. Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. The ratio of significant improvements is up to 94.44%. Moreover, our method scales to conditional imitation learning with real-world noisy commands in urban driving

NeurIPS Conference 2024 Conference Paper

What If the Input is Expanded in OOD Detection?

  • Boxuan Zhang
  • Jianing Zhu
  • Zengmao Wang
  • Tongliang Liu
  • Bo Du
  • Bo Han

Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes, which is important for the reliable deployment of machine learning models in the open world. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. However, existing methods generally focus on excavating the discriminative information from a single input, which implicitly limits its representation dimension. In this work, we introduce a novel perspective, i. e. , employing different common corruptions on the input space, to expand that. We reveal an interesting phenomenon termed confidence mutation, where the confidence of OOD data can decrease significantly under the corruptions, while the ID data shows a higher confidence expectation considering the resistance of semantic features. Based on that, we formalize a new scoring method, namely, Confidence aVerage (CoVer), which can capture the dynamic differences by simply averaging the scores obtained from different corrupted inputs and the original ones, making the OOD and ID distributions more separable in detection tasks. Extensive experiments and analyses have been conducted to understand and verify the effectiveness of CoVer.

NeurIPS Conference 2023 Conference Paper

Combating Bilateral Edge Noise for Robust Link Prediction

  • Zhanke Zhou
  • Jiangchao Yao
  • Jiaxu Liu
  • Xiawei Guo
  • Quanming Yao
  • Li He
  • Liang Wang
  • Bo Zheng

Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i. e. , self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https: //github. com/tmlr-group/RGIB.

NeurIPS Conference 2023 Conference Paper

Combating Representation Learning Disparity with Geometric Harmonization

  • Zhihan Zhou
  • Jiangchao Yao
  • Feng Hong
  • Ya Zhang
  • Bo Han
  • Yanfeng Wang

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios. Nevertheless, when facing the long-tailed distribution in real-world applications, it is still hard for existing methods to capture transferable and robust representation. The attribution is that the vanilla SSL methods that pursue the sample-level uniformity easily leads to representation learning disparity, where head classes with the huge sample number dominate the feature regime but tail classes with the small sample number passively collapse. To address this problem, we propose a novel Geometric Harmonization (GH) method to encourage the category-level uniformity in representation learning, which is more benign to the minority and almost does not hurt the majority under long-tailed distribution. Specially, GH measures the population statistics of the embedding space on top of self-supervised learning, and then infer an fine-grained instance-wise calibration to constrain the space expansion of head classes and avoid the passive collapse of tail classes. Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner. Extensive results on a range of benchmark datasets show the effectiveness of \methodspace with high tolerance to the distribution skewness.

NeurIPS Conference 2023 Conference Paper

Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation

  • Jianing Zhu
  • Yu Geng
  • Jiangchao Yao
  • Tongliang Liu
  • Gang Niu
  • Masashi Sugiyama
  • Bo Han

Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications. Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers. However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging. In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers. Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training. It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE. The code is publicly available at: https: //github. com/tmlr-group/DivOE.

NeurIPS Conference 2023 Conference Paper

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

  • Yongqiang Chen
  • Yatao Bian
  • Kaiwen Zhou
  • Binghui Xie
  • Bo Han
  • James Cheng

Invariant graph representation learning aims to learn the invariance among data from different environments for out-of-distribution generalization on graphs. As the graph environment partitions are usually expensive to obtain, augmenting the environment information has become the de facto approach. However, the usefulness of the augmented environment information has never been verified. In this work, we find that it is fundamentally impossible to learn invariant graph representations via environment augmentation without additional assumptions. Therefore, we develop a set of minimal assumptions, including variation sufficiency and variation consistency, for feasible invariant graph learning. We then propose a new framework Graph invAriant Learning Assistant (GALA). GALA incorporates an assistant model that needs to be sensitive to graph environment changes or distribution shifts. The correctness of the proxy predictions by the assistant model hence can differentiate the variations in spurious subgraphs. We show that extracting the maximally invariant subgraph to the proxy predictions provably identifies the underlying invariant subgraph for successful OOD generalization under the established minimal assumptions. Extensive experiments on datasets including DrugOOD with various graph distribution shifts confirm the effectiveness of GALA.

NeurIPS Conference 2023 Conference Paper

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

  • Ziqing Fan
  • Ruipeng Zhang
  • Jiangchao Yao
  • Bo Han
  • Ya Zhang
  • Yanfeng Wang

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame (ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance (averaged improvement of 3. 9% to FedAvg and 1. 5% to best baselines) and provide both local and global convergence guarantees.

NeurIPS Conference 2023 Conference Paper

FedFed: Feature Distillation against Data Heterogeneity in Federated Learning

  • Zhiqin Yang
  • Yonggang Zhang
  • Yu Zheng
  • Xinmei Tian
  • Hao Peng
  • Tongliang Liu
  • Bo Han

Federated learning (FL) typically faces data heterogeneity, i. e. , distribution shifting among clients. Sharing clients' information has shown great potentiality in mitigating data heterogeneity, yet incurs a dilemma in preserving privacy and promoting model performance. To alleviate the dilemma, we raise a fundamental question: Is it possible to share partial features in the data to tackle data heterogeneity? In this work, we give an affirmative answer to this question by proposing a novel approach called Fed erated Fe ature d istillation (FedFed). Specifically, FedFed partitions data into performance-sensitive features (i. e. , greatly contributing to model performance) and performance-robust features (i. e. , limitedly contributing to model performance). The performance-sensitive features are globally shared to mitigate data heterogeneity, while the performance-robust features are kept locally. FedFed enables clients to train models over local and shared data. Comprehensive experiments demonstrate the efficacy of FedFed in promoting model performance.

NeurIPS Conference 2023 Conference Paper

FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning

  • Zhuo Huang
  • Li Shen
  • Jun Yu
  • Bo Han
  • Tongliang Liu

Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. However, most SSL methods are commonly based on instance-wise consistency between different data transformations. Therefore, the label guidance on labeled data is hard to be propagated to unlabeled data. Consequently, the learning process on labeled data is much faster than on unlabeled data which is likely to fall into a local minima that does not favor unlabeled data, leading to sub-optimal generalization performance. In this paper, we propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets. Specifically, we increase the empirical risk on labeled data to obtain a worst-case model which is a failure case needing to be enhanced. Then, by leveraging the richness of unlabeled data, we penalize the prediction difference (i. e. , cross-sharpness) between the worst-case model and the original model so that the learning direction is beneficial to generalization on unlabeled data. Therefore, we can calibrate the learning process without being limited to insufficient label information. As a result, the mismatched learning performance can be mitigated, further enabling the effective exploitation of unlabeled data and improving SSL performance. Through comprehensive validation, we show FlatMatch achieves state-of-the-art results in many SSL settings.

NeurIPS Conference 2023 Conference Paper

InstanT: Semi-supervised Learning with Instance-dependent Thresholds

  • Muyang Li
  • Runze Wu
  • Haoyu Liu
  • Jun Yu
  • Xun Yang
  • Bo Han
  • Tongliang Liu

Semi-supervised learning (SSL) has been a fundamental challenge in machine learning for decades. The primary family of SSL algorithms, known as pseudo-labeling, involves assigning pseudo-labels to confident unlabeled instances and incorporating them into the training set. Therefore, the selection criteria of confident instances are crucial to the success of SSL. Recently, there has been growing interest in the development of SSL methods that use dynamic or adaptive thresholds. Yet, these methods typically apply the same threshold to all samples, or use class-dependent thresholds for instances belonging to a certain class, while neglecting instance-level information. In this paper, we propose the study of instance-dependent thresholds, which has the highest degree of freedom compared with existing methods. Specifically, we devise a novel instance-dependent threshold function for all unlabeled instances by utilizing their instance-level ambiguity and the instance-dependent error rates of pseudo-labels, so instances that are more likely to have incorrect pseudo-labels will have higher thresholds. Furthermore, we demonstrate that our instance-dependent threshold function provides a bounded probabilistic guarantee for the correctness of the pseudo-labels it assigns.

TMLR Journal 2023 Journal Article

KRADA: Known-region-aware Domain Alignment for Open-set Domain Adaptation in Semantic Segmentation

  • Chenhong Zhou
  • Feng Liu
  • Chen Gong
  • Rongfei Zeng
  • Tongliang Liu
  • William Cheung
  • Bo Han

In semantic segmentation, we aim to train a pixel-level classifier to assign category labels to all pixels in an image, where labeled training images and unlabeled test images are from the same distribution and share the same label set. However, in an open world, the unlabeled test images probably contain unknown categories and have different distributions from the labeled images. Hence, in this paper, we consider a new, more realistic, and more challenging problem setting where the pixel-level classifier has to be trained with labeled images and unlabeled open-world images—we name it open world semantic segmentation (OSS). In OSS, the trained classifier is expected to identify unknown-class pixels and classify known-class pixels well. To solve OSS, we first investigate which distribution that unknown-class pixels obey. Then, motivated by the goodness-of-fit test, we use statistical measurements to show how a pixel fits the distribution of an unknown class and select highly-fitted pixels to form the unknown region in each test image. Eventually, we propose an end-to-end learning framework, known-region-aware domain alignment (KRADA), to distinguish unknown classes while aligning the distributions of known classes in labeled and unlabeled open-world images. The effectiveness of KRADA has been verified on two synthetic tasks and one COVID-19 segmentation task.

NeurIPS Conference 2023 Conference Paper

Learning to Augment Distributions for Out-of-distribution Detection

  • Qizhou Wang
  • Zhen Fang
  • Yonggang Zhang
  • Feng Liu
  • Yixuan Li
  • Bo Han

Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lacking knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affect the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAOL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAOL over its advanced counterparts.

AAAI Conference 2023 Conference Paper

NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

  • Xin He
  • Jiangchao Yao
  • Yuxin Wang
  • Zhenheng Tang
  • Ka Chun Cheung
  • Simon See
  • Bo Han
  • Xiaowen Chu

One-shot neural architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate the performance of every possible child architecture (i.e., subnet). However, the inconsistency of characteristics among subnets incurs serious interference in the optimization, resulting in poor performance ranking correlation of subnets. Subsequent explorations decompose supernet weights via a particular criterion, e.g., gradient matching, to reduce the interference; yet they suffer from huge computational cost and low space separability. In this work, we propose a lightweight and effective local intrinsic dimension (LID)-based method NAS-LID. NAS-LID evaluates the geometrical properties of architectures by calculating the low-cost LID features layer-by-layer, and the similarity characterized by LID enjoys better separability compared with gradients, which thus effectively reduces the interference among subnets. Extensive experiments on NASBench-201 indicate that NAS-LID achieves superior performance with better efficiency. Specifically, compared to the gradient-driven method, NAS-LID can save up to 86% of GPU memory overhead when searching on NASBench-201. We also demonstrate the effectiveness of NAS-LID on ProxylessNAS and OFA spaces. Source code:https://github.com/marsggbo/NAS-LID.

TMLR Journal 2023 Journal Article

Noise-robust Graph Learning by Estimating and Leveraging Pairwise Interactions

  • Xuefeng Du
  • Tian Bian
  • Yu Rong
  • Bo Han
  • Tongliang Liu
  • Tingyang Xu
  • Wenbing Huang
  • Yixuan Li

Teaching Graph Neural Networks (GNNs) to accurately classify nodes under severely noisy labels is an important problem in real-world graph learning applications, but is currently underexplored. Although pairwise training methods have demonstrated promise in supervised metric learning and unsupervised contrastive learning, they remain less studied on noisy graphs, where the structural pairwise interactions (PI) between nodes are abundant and thus might benefit label noise learning rather than the pointwise methods. This paper bridges the gap by proposing a pairwise framework for noisy node classification on graphs, which relies on the PI as a primary learning proxy in addition to the pointwise learning from the noisy node class labels. Our proposed framework PI-GNN contributes two novel components: (1) a confidence-aware PI estimation model that adaptively estimates the PI labels, which are defined as whether the two nodes share the same node labels, and (2) a decoupled training approach that leverages the estimated PI labels to regularize a node classification model for robust node classification. Extensive experiments on different datasets and GNN architectures demonstrate the effectiveness of PI-GNN, yielding a promising improvement over the state-of-the-art methods. Code is publicly available at https://github.com/TianBian95/pi-gnn.

NeurIPS Conference 2023 Conference Paper

Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources

  • Haotian Zheng
  • Qizhou Wang
  • Zhen Fang
  • Xiaobo Xia
  • Feng Liu
  • Tongliang Liu
  • Bo Han

Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learning methods, synthesizing OOD data via data generators for predictor training without requiring any real OOD data. Related methods typically pre-train a generator on ID data and adopt various selection procedures to find those data likely to be the OOD cases. However, generated data may still coincide with ID semantics, i. e. , mistaken OOD generation remains, confusing the predictor between ID and OOD data. To this end, we suggest that generated data (with mistaken OOD generation) can be used to devise an auxiliary OOD detection task to facilitate real OOD detection. Specifically, we can ensure that learning from such an auxiliary task is beneficial if the ID and the OOD parts have disjoint supports, with the help of a well-designed training procedure for the predictor. Accordingly, we propose a powerful data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.

NeurIPS Conference 2023 Conference Paper

SODA: Robust Training of Test-Time Data Adaptors

  • Zige Wang
  • Yonggang Zhang
  • Zhen Fang
  • Long Lan
  • Wenjing Yang
  • Bo Han

Adapting models deployed to test distributions can mitigate the performance degradation caused by distribution shifts. However, privacy concerns may render model parameters inaccessible. One promising approach involves utilizing zeroth-order optimization (ZOO) to train a data adaptor to adapt the test data to fit the deployed models. Nevertheless, the data adaptor trained with ZOO typically brings restricted improvements due to the potential corruption of data features caused by the data adaptor. To address this issue, we revisit ZOO in the context of test-time data adaptation. We find that the issue directly stems from the unreliable estimation of the gradients used to optimize the data adaptor, which is inherently due to the unreliable nature of the pseudo-labels assigned to the test data. Based on this observation, we propose pseudo-label-robust data adaptation (SODA) to improve the performance of data adaptation. Specifically, SODA leverages high-confidence predicted labels as reliable labels to optimize the data adaptor with ZOO for label prediction. For data with low-confidence predictions, SODA encourages the adaptor to preserve data information to mitigate data corruption. Empirical results indicate that SODA can significantly enhance the performance of deployed models in the presence of distribution shifts without requiring access to model parameters.

NeurIPS Conference 2023 Conference Paper

Subclass-Dominant Label Noise: A Counterexample for the Success of Early Stopping

  • Yingbin Bai
  • Zhongyi Han
  • Erkun Yang
  • Jun Yu
  • Bo Han
  • Dadong Wang
  • Tongliang Liu

In this paper, we empirically investigate a previously overlooked and widespread type of label noise, subclass-dominant label noise (SDN). Our findings reveal that, during the early stages of training, deep neural networks can rapidly memorize mislabeled examples in SDN. This phenomenon poses challenges in effectively selecting confident examples using conventional early stopping techniques. To address this issue, we delve into the properties of SDN and observe that long-trained representations are superior at capturing the high-level semantics of mislabeled examples, leading to a clustering effect where similar examples are grouped together. Based on this observation, we propose a novel method called NoiseCluster that leverages the geometric structures of long-trained representations to identify and correct SDN. Our experiments demonstrate that NoiseCluster outperforms state-of-the-art baselines on both synthetic and real-world datasets, highlighting the importance of addressing SDN in learning with noisy labels. The code is available at https: //github. com/tmllab/2023 NeurIPS SDN.

NeurIPS Conference 2023 Conference Paper

Understanding and Improving Feature Learning for Out-of-Distribution Generalization

  • Yongqiang Chen
  • Wei Huang
  • Kaiwen Zhou
  • Yatao Bian
  • Bo Han
  • James Cheng

A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. Despite the contradictions at first glance, we theoretically show that ERM essentially learns both spurious and invariant features, while ERM tends to learn spurious features faster if the spurious correlation is stronger. Moreover, when fed the ERM learned features to the OOD objectives, the invariant feature learning quality significantly affects the final OOD performance, as OOD objectives rarely learn new features. Therefore, ERM feature learning can be a bottleneck to OOD generalization. To alleviate the reliance, we propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization. FeAT iteratively augments the model to learn new features while retaining the already learned features. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FeAT effectively learns richer features thus boosting the performance of various OOD objectives.

IROS Conference 2022 Conference Paper

A Deep-Learning-based System for Indoor Active Cleaning

  • Yike Yun
  • Linjie Hou
  • Zijian Feng
  • Wei Jin
  • Yang Liu
  • Heng Wang
  • Ruonan He
  • Weitao Guo

Cleaning public areas like commercial complexes is challenging due to their sophisticated surroundings and the vast kinds of real-life dirt. Robots are required to distinguish dirts and apply corresponding cleaning strategies. In this work, we proposed an active-cleaning framework by utilizing deep-learning methods for both solid wastes detection and liquid stains segmentation. Our system consists of 4 components: a Perception module integrated with deep-learning models, a Post-processing module for projection, a Tracking module for map localization, and a Planning and Control module for cleaning strategies. Compared with classic approaches, our vision-based system significantly improves cleaning efficiency. Besides, we released the largest real-world indoor hybrid dirt cleaning dataset (HD10K) containing 10K labeled images, together with a track-level evaluation metric for better cleaning performance measurement. The proposed deep-learning based system is verified with extensive experiments on our dataset, and deployed to Gaussian Robotics's robots operating globally. Dataset is available at: https://gaussianopensource.github.io/projects/active_cleaning.

NeurIPS Conference 2022 Conference Paper

Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks

  • Jianan Zhou
  • Jianing Zhu
  • Jingfeng Zhang
  • Tongliang Liu
  • Gang Niu
  • Bo Han
  • Masashi Sugiyama

Adversarial training (AT) with imperfect supervision is significant but receives limited attention. To push AT towards more practical scenarios, we explore a brand new yet challenging setting, i. e. , AT with complementary labels (CLs), which specify a class that a data sample does not belong to. However, the direct combination of AT with existing methods for CLs results in consistent failure, but not on a simple baseline of two-stage training. In this paper, we further explore the phenomenon and identify the underlying challenges of AT with CLs as intractable adversarial optimization and low-quality adversarial examples. To address the above problems, we propose a new learning strategy using gradually informative attacks, which consists of two critical components: 1) Warm-up Attack (Warm-up) gently raises the adversarial perturbation budgets to ease the adversarial optimization with CLs; 2) Pseudo-Label Attack (PLA) incorporates the progressively informative model predictions into a corrected complementary loss. Extensive experiments are conducted to demonstrate the effectiveness of our method on a range of benchmarked datasets. The code is publicly available at: https: //github. com/RoyalSkye/ATCL.

NeurIPS Conference 2022 Conference Paper

Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization

  • De Cheng
  • Yixiong Ning
  • Nannan Wang
  • Xinbo Gao
  • Heng Yang
  • Yuxuan Du
  • Bo Han
  • Tongliang Liu

In label-noise learning, estimating the transition matrix plays an important role in building statistically consistent classifier. Current state-of-the-art consistent estimator for the transition matrix has been developed under the newly proposed sufficiently scattered assumption, through incorporating the minimum volume constraint of the transition matrix T into label-noise learning. To compute the volume of T, it heavily relies on the estimated noisy class posterior. However, the estimation error of the noisy class posterior could usually be large as deep learning methods tend to easily overfit the noisy labels. Then, directly minimizing the volume of such obtained T could lead the transition matrix to be poorly estimated. Therefore, how to reduce the side-effects of the inaccurate noisy class posterior has become the bottleneck of such method. In this paper, we creatively propose to estimate the transition matrix under the forward-backward cycle-consistency regularization, of which we have greatly reduced the dependency of estimating the transition matrix T on the noisy class posterior. We show that the cycle-consistency regularization helps to minimize the volume of the transition matrix T indirectly without exploiting the estimated noisy class posterior, which could further encourage the estimated transition matrix T to converge to its optimal solution. Extensive experimental results consistently justify the effectiveness of the proposed method, on reducing the estimation error of the transition matrix and greatly boosting the classification performance.

NeurIPS Conference 2022 Conference Paper

Counterfactual Fairness with Partially Known Causal Graph

  • Aoqi Zuo
  • Susan Wei
  • Tongliang Liu
  • Bo Han
  • Kun Zhang
  • Mingming Gong

Fair machine learning aims to avoid treating individuals or sub-populations unfavourably based on \textit{sensitive attributes}, such as gender and race. Those methods in fair machine learning that are built on causal inference ascertain discrimination and bias through causal effects. Though causality-based fair learning is attracting increasing attention, current methods assume the true causal graph is fully known. This paper proposes a general method to achieve the notion of counterfactual fairness when the true causal graph is unknown. To select features that lead to counterfactual fairness, we derive the conditions and algorithms to identify ancestral relations between variables on a \textit{Partially Directed Acyclic Graph (PDAG)}, specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. Interestingly, we find that counterfactual fairness can be achieved as if the true causal graph were fully known, when specific background knowledge is provided: the sensitive attributes do not have ancestors in the causal graph. Results on both simulated and real-world datasets demonstrate the effectiveness of our method.

NeurIPS Conference 2022 Conference Paper

Exact Shape Correspondence via 2D graph convolution

  • Barakeel Fanseu Kamhoua
  • Lin Zhang
  • Yongqiang Chen
  • Han Yang
  • MA KAILI
  • Bo Han
  • Bo Li
  • James Cheng

For exact 3D shape correspondence (matching or alignment), i. e. , the task of matching each point on a shape to its exact corresponding point on the other shape (or to be more specific, matching at geodesic error 0), most existing methods do not perform well due to two main problems. First, on nearly-isometric shapes (i. e. , low noise levels), most existing methods use the eigen-vectors (eigen-functions) of the Laplace Beltrami Operator (LBO) or other shape descriptors to update an initialized correspondence which is not exact, leading to an accumulation of update errors. Thus, though the final correspondence may generally be smooth, it is generally inexact. Second, on non-isometric shapes (noisy shapes), existing methods are generally not robust to noise as they usually assume near-isometry. In addition, existing methods that attempt to address the non-isometric shape problem (e. g. , GRAMPA) are generally computationally expensive and do not generalise to nearly-isometric shapes. To address these two problems, we propose a 2D graph convolution-based framework called 2D-GEM. 2D-GEM is robust to noise on non-isometric shapes and with a few additional constraints, it also addresses the errors in the update on nearly-isometric shapes. We demonstrate the effectiveness of 2D-GEM by achieving a high accuracy of 90. 5$\%$ at geodesic error 0 on the non-isometric benchmark SHREC16, i. e. , TOPKIDS (while being much faster than GRAMPA), and on nearly-isometric benchmarks by achieving a high accuracy of 92. 5$\%$ on TOSCA and 84. 9$\%$ on SCAPE at geodesic error 0.

NeurIPS Conference 2022 Conference Paper

Is Out-of-Distribution Detection Learnable?

  • Zhen Fang
  • Yixuan Li
  • Jie Lu
  • Jiahua Dong
  • Bo Han
  • Feng Liu

Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i. e. , OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.

NeurIPS Conference 2022 Conference Paper

Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs

  • Yongqiang Chen
  • Yonggang Zhang
  • Yatao Bian
  • Han Yang
  • MA KAILI
  • Binghui Xie
  • Tongliang Liu
  • Bo Han

Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e. g. , images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structures, making it difficult to identify the invariance. Moreover, domain or environment partitions, which are often required by OOD methods on Euclidean data, could be highly expensive to obtain for graphs. To bridge this gap, we propose a new framework, called Causality Inspired Invariant Graph LeArning (CIGA), to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts. Specifically, we characterize potential distribution shifts on graphs with causal models, concluding that OOD generalization on graphs is achievable when models focus only on subgraphs containing the most information about the causes of labels. Accordingly, we propose an information-theoretic objective to extract the desired subgraphs that maximally preserve the invariant intra-class information. Learning with these subgraphs is immune to distribution shifts. Extensive experiments on 16 synthetic or real-world datasets, including a challenging setting -- DrugOOD, from AI-aided drug discovery, validate the superior OOD performance of CIGA.

JMLR Journal 2022 Journal Article

Learning from Noisy Pairwise Similarity and Unlabeled Data

  • Songhua Wu
  • Tongliang Liu
  • Bo Han
  • Jun Yu
  • Gang Niu
  • Masashi Sugiyama

SU classification employs similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points to build a classifier, which can serve as an alternative to the standard supervised trained classifiers requiring data points with class labels. SU classification is advantageous because in the era of big data, more attention has been paid to data privacy. Datasets with specific class labels are often difficult to obtain in real-world classification applications regarding privacy-sensitive matters, such as politics and religion, which can be a bottleneck in supervised classification. Fortunately, similarity labels do not reveal the explicit information and inherently protect the privacy, e.g., collecting answers to “With whom do you share the same opinion on issue $\mathcal{I}$?" instead of “What is your opinion on issue $\mathcal{I}$?". Nevertheless, SU classification still has an obvious limitation: respondents might answer these questions in a manner that is viewed favorably by others instead of answering truthfully. Therefore, there exist some dissimilar data pairs labeled as similar, which significantly degenerates the performance of SU classification. In this paper, we study how to learn from noisy similar (nS) data pairs and unlabeled (U) data, which is called nSU classification. Specifically, we carefully model the similarity noise and estimate the noise rate by using the mixture proportion estimation technique. Then, a clean classifier can be learned by minimizing a denoised and unbiased classification risk estimator, which only involves the noisy data. Moreover, we further derive a theoretical generalization error bound for the proposed method. Experimental results demonstrate the effectiveness of the proposed algorithm on several benchmark datasets. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

JMLR Journal 2022 Journal Article

Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization

  • Quanming Yao
  • Yaqing Wang
  • Bo Han
  • James T. Kwok

Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special “sparse plus low-rank" structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-Lojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

TMLR Journal 2022 Journal Article

NoiLin: Improving adversarial training and correcting stereotype of noisy labels

  • Jingfeng Zhang
  • Xilie Xu
  • Bo Han
  • Tongliang Liu
  • Lizhen Cui
  • Gang Niu
  • Masashi Sugiyama

Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhibit the robustness at odds with accuracy and the existence of the cross-over mixture problem, which motivates us to study some label randomness for benefiting the AT. First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain some observations on when NL injection benefits AT. Second, based on the observations, we propose a simple but effective method---NoiLIn that randomly injects NLs into training data at each training epoch and dynamically increases the NL injection rate once robust overfitting occurs. Empirically, NoiLIn can significantly mitigate the AT's undesirable issue of robust overfitting and even further improve the generalization of the state-of-the-art AT methods. Philosophically, NoiLIn sheds light on a new perspective of learning with NLs: NLs should not always be deemed detrimental, and even in the absence of NLs in the training set, we may consider injecting them deliberately.

NeurIPS Conference 2022 Conference Paper

Pluralistic Image Completion with Gaussian Mixture Models

  • Xiaobo Xia
  • Wenhao Yang
  • Jie Ren
  • Yewen Li
  • Yibing Zhan
  • Bo Han
  • Tongliang Liu

Pluralistic image completion focuses on generating both visually realistic and diverse results for image completion. Prior methods enjoy the empirical successes of this task. However, their used constraints for pluralistic image completion are argued to be not well interpretable and unsatisfactory from two aspects. First, the constraints for visual reality can be weakly correlated to the objective of image completion or even redundant. Second, the constraints for diversity are designed to be task-agnostic, which causes the constraints to not work well. In this paper, to address the issues, we propose an end-to-end probabilistic method. Specifically, we introduce a unified probabilistic graph model that represents the complex interactions in image completion. The entire procedure of image completion is then mathematically divided into several sub-procedures, which helps efficient enforcement of constraints. The sub-procedure directly related to pluralistic results is identified, where the interaction is established by a Gaussian mixture model (GMM). The inherent parameters of GMM are task-related, which are optimized adaptively during training, while the number of its primitives can control the diversity of results conveniently. We formally establish the effectiveness of our method and demonstrate it with comprehensive experiments. The implementationis available at https: //github. com/tmllab/PICMM.

IJCAI Conference 2022 Conference Paper

Robust Weight Perturbation for Adversarial Training

  • Chaojian Yu
  • Bo Han
  • Mingming Gong
  • Li Shen
  • Shiming Ge
  • Du Bo
  • Tongliang Liu

Overfitting widely exists in adversarial robust training of deep networks. An effective remedy is adversarial weight perturbation, which injects the worst-case weight perturbation during network training by maximizing the classification loss on adversarial examples. Adversarial weight perturbation helps reduce the robust generalization gap; however, it also undermines the robustness improvement. A criterion that regulates the weight perturbation is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely Loss Stationary Condition (LSC) for constrained perturbation. With LSC, we find that it is essential to conduct weight perturbation on adversarial data with small classification loss to eliminate robust overfitting. Weight perturbation on adversarial data with large classification loss is not necessary and may even lead to poor robustness. Based on these observations, we propose a robust perturbation strategy to constrain the extent of weight perturbation. The perturbation strategy prevents deep networks from overfitting while avoiding the side effect of excessive weight perturbation, significantly improving the robustness of adversarial training. Extensive experiments demonstrate the superiority of the proposed method over the state-of-the-art adversarial training methods.

NeurIPS Conference 2022 Conference Paper

RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning

  • Yingbin Bai
  • Erkun Yang
  • Zhaoqing Wang
  • Yuxuan Du
  • Bo Han
  • Cheng Deng
  • Dadong Wang
  • Tongliang Liu

Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images' structures leading to a severe semantic shift problem that augmented views of the same image may not share the same semantics, thus degrading the transfer performance. To address this problem, we propose a new SSL paradigm, which counteracts the impact of semantic shift by balancing the role of weak and aggressively augmented pairs. Specifically, semantically inconsistent pairs are of minority, and we treat them as noisy pairs. Note that deep neural networks (DNNs) have a crucial memorization effect that DNNs tend to first memorize clean (majority) examples before overfitting to noisy (minority) examples. Therefore, we set a relatively large weight for aggressively augmented data pairs at the early learning stage. With the training going on, the model begins to overfit noisy pairs. Accordingly, we gradually reduce the weights of aggressively augmented pairs. In doing so, our method can better embrace aggressive augmentations and neutralize the semantic shift problem. Experiments show that our model achieves 73. 1% top-1 accuracy on ImageNet-1K with ResNet-50 for 200 epochs, which is a 2. 5% improvement over BYOL. Moreover, experiments also demonstrate that the learned representations can transfer well for various downstream tasks. Code is released at: https: //github. com/tmllab/RSA.

TMLR Journal 2022 Journal Article

SemiNLL: A Framework of Noisy-Label Learning by Semi-Supervised Learning

  • Zhuowei Wang
  • Jing Jiang
  • Bo Han
  • Lei Feng
  • Bo An
  • Gang Niu
  • Guodong Long

Deep learning with noisy labels is a challenging task, which has received much attention from the machine learning and computer vision communities. Recent prominent methods that build on a specific sample selection (SS) strategy and a specific semi-supervised learning (SSL) model achieved state-of-the-art performance. Intuitively, better performance could be achieved if stronger SS strategies and SSL models are employed. Following this intuition, one might easily derive various effective noisy-label learning methods using different combinations of SS strategies and SSL models, which is, however, simply reinventing the wheel in essence. To prevent this problem, we propose SemiNLL, a versatile framework that investigates how to naturally combine different SS and SSL components based on their effects and efficiencies. We conduct a systematic and detailed analysis of the combinations of possible components based on our framework. Our framework can absorb various SS strategies and SSL backbones, utilizing their power to achieve promising performance. The instantiations of our framework demonstrate substantial improvements over state-of-the-art methods on benchmark-simulated and real-world datasets with noisy labels.

NeurIPS Conference 2022 Conference Paper

Synergy-of-Experts: Collaborate to Improve Adversarial Robustness

  • Sen Cui
  • Jingfeng Zhang
  • Jian Liang
  • Bo Han
  • Masashi Sugiyama
  • Changshui Zhang

Learning adversarially robust models require invariant predictions to a small neighborhood of its natural inputs, often encountering insufficient model capacity. There is research showing that learning multiple sub-models in an ensemble could mitigate this insufficiency, further improving the generalization and the robustness. However, the ensemble's voting-based strategy excludes the possibility that the true predictions remain with the minority. Therefore, this paper further improves the ensemble through a collaboration scheme---Synergy-of-Experts (SoE). Compared with the voting-based strategy, the SoE enables the possibility of correct predictions even if there exists a single correct sub-model. In SoE, every sub-model fits its specific vulnerability area and reserves the rest of the sub-models to fit other vulnerability areas, which effectively optimizes the utilization of the model capacity. Empirical experiments verify that SoE outperforms various ensemble methods against white-box and transfer-based adversarial attacks.

NeurIPS Conference 2022 Conference Paper

Towards Lightweight Black-Box Attack Against Deep Neural Networks

  • Chenghao Sun
  • Yonggang Zhang
  • Wan Chaoqun
  • Qizhou Wang
  • Ya Li
  • Tongliang Liu
  • Bo Han
  • Xinmei Tian

Black-box attacks can generate adversarial examples without accessing the parameters of target model, largely exacerbating the threats of deployed deep neural networks (DNNs). However, previous works state that black-box attacks fail to mislead target models when their training data and outputs are inaccessible. In this work, we argue that black-box attacks can pose practical attacks in this extremely restrictive scenario where only several test samples are available. Specifically, we find that attacking the shallow layers of DNNs trained on a few test samples can generate powerful adversarial examples. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. The main challenge to promoting lightweight attacks is to mitigate the adverse impact caused by the approximation error of shallow layers. As it is hard to mitigate the approximation error with few available samples, we propose Error TransFormer (ETF) for lightweight attacks. Namely, ETF transforms the approximation error in the parameter space into a perturbation in the feature space and alleviates the error by disturbing features. In experiments, lightweight black-box attacks with the proposed ETF achieve surprising results. For example, even if only 1 sample per category available, the attack success rate in lightweight black-box attacks is only about 3% lower than that of the black-box attacks with complete training data.

NeurIPS Conference 2022 Conference Paper

Watermarking for Out-of-distribution Detection

  • Qizhou Wang
  • Feng Liu
  • Yonggang Zhang
  • Jing Zhang
  • Chen Gong
  • Tongliang Liu
  • Bo Han

Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models. However, existing methods largely ignore the reprogramming property of deep models and thus may not fully unleash their intrinsic strength: without modifying parameters of a well-trained deep model, we can reprogram this model for a new purpose via data-level manipulation (e. g. , adding a specific feature perturbation). This property motivates us to reprogram a classification model to excel at OOD detection (a new task), and thus we propose a general methodology named watermarking in this paper. Specifically, we learn a unified pattern that is superimposed onto features of original data, and the model's detection capability is largely boosted after watermarking. Extensive experiments verify the effectiveness of watermarking, demonstrating the significance of the reprogramming property of deep models in OOD detection.

NeurIPS Conference 2021 Conference Paper

Instance-dependent Label-noise Learning under a Structural Causal Model

  • Yu Yao
  • Tongliang Liu
  • Mingming Gong
  • Bo Han
  • Gang Niu
  • Kun Zhang

Label noise generally degenerates the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let $X$ and $Y$ denote the instance and clean label, respectively. When $Y$ is a cause of $X$, according to which many datasets have been constructed, e. g. , \textit{SVHN} and \textit{CIFAR}, the distributions of $P(X)$ and $P(Y|X)$ are generally entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label-noise problem. We propose to model and make use of the causal process in order to correct the label-noise effect. Empirically, the proposed method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.

AAAI Conference 2021 Conference Paper

Learning with Group Noise

  • Qizhou Wang
  • Jiangchao Yao
  • Chen Gong
  • Tongliang Liu
  • Mingming Gong
  • Hongxia Yang
  • Bo Han

Machine learning in the context of noise is a challenging but practical setting to plenty of real-world applications. Most of the previous approaches in this area focus on the pairwise relation (casual or correlational relationship) with noise, such as learning with noisy labels. However, the group noise, which is parasitic on the coarse-grained accurate relation with the fine-grained uncertainty, is also universal and has not been well investigated. The challenge under this setting is how to discover true pairwise connections concealed by the group relation with its fine-grained noise. To overcome this issue, we propose a novel Max-Matching method for learning with group noise. Specifically, it utilizes a matching mechanism to evaluate the relation confidence of each object (cf. Figure 1) w. r. t. the target, meanwhile considering the Non-IID characteristics among objects in the group. Only the most confident object is considered to learn the model, so that the fine-grained noise is mostly dropped. The performance on a range of real-world datasets in the area of several learning paradigms demonstrates the effectiveness of Max-Matching.

NeurIPS Conference 2021 Conference Paper

Probabilistic Margins for Instance Reweighting in Adversarial Training

  • Qizhou Wang
  • Feng Liu
  • Bo Han
  • Tongliang Liu
  • Chen Gong
  • Gang Niu
  • Mingyuan Zhou
  • Masashi Sugiyama

Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights. However, existing methods measuring the closeness are not very reliable: they are discrete and can take only a few values, and they are path-dependent, i. e. , they may change given the same start and end points with different attack paths. In this paper, we propose three types of probabilistic margin (PM), which are continuous and path-independent, for measuring the aforementioned closeness and reweighing adversarial data. Specifically, a PM is defined as the difference between two estimated class-posterior probabilities, e. g. , such a probability of the true label minus the probability of the most confusing label given some natural data. Though different PMs capture different geometric properties, all three PMs share a negative correlation with the vulnerability of data: data with larger/smaller PMs are safer/riskier and should have smaller/larger weights. Experiments demonstrated that PMs are reliable and PM-based reweighting methods outperformed state-of-the-art counterparts.

AAAI Conference 2021 Conference Paper

Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model

  • Qizhou Wang
  • Bo Han
  • Tongliang Liu
  • Gang Niu
  • Jian Yang
  • Chen Gong

The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning methods with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts.

NeurIPS Conference 2021 Conference Paper

TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation

  • Haoang Chi
  • Feng Liu
  • Wenjing Yang
  • Long Lan
  • Tongliang Liu
  • Bo Han
  • William Cheung
  • James Kwok

In few-shot domain adaptation (FDA), classifiers for the target domain are trained with \emph{accessible} labeled data in the source domain (SD) and few labeled data in the target domain (TD). However, data usually contain private information in the current era, e. g. , data distributed on personal phones. Thus, the private data will be leaked if we directly access data in SD to train a target-domain classifier (required by FDA methods). In this paper, to prevent privacy leakage in SD, we consider a very challenging problem setting, where the classifier for the TD has to be trained using few labeled target data and a well-trained SD classifier, named few-shot hypothesis adaptation (FHA). In FHA, we cannot access data in SD, as a result, the private information in SD will be protected well. To this end, we propose a target-oriented hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i. e. , an intermediate domain) to help train a target-domain classifier. TOHAN maintains two deep networks simultaneously, in which one focuses on learning an intermediate domain and the other takes care of the intermediate-to-target distributional adaptation and the target-risk minimization. Experimental results show that TOHAN outperforms competitive baselines significantly.

NeurIPS Conference 2021 Conference Paper

Understanding and Improving Early Stopping for Learning with Noisy Labels

  • Yingbin Bai
  • Erkun Yang
  • Bo Han
  • Yanhua Yang
  • Jiatong Li
  • Yinian Mao
  • Gang Niu
  • Tongliang Liu

The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stopping trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stopping point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stopping point for the whole network may make different DNN layers antagonistically affect each other, thus degrading the final performance. In this paper, we propose to separate a DNN into different parts and progressively train them to address this problem. Instead of the early stopping which trains a whole DNN all at once, we initially train former DNN layers by optimizing the DNN with a relatively large number of epochs. During training, we progressively train the latter DNN layers by using a smaller number of epochs with the preceding layers fixed to counteract the impact of noisy labels. We term the proposed method as progressive early stopping (PES). Despite its simplicity, compared with the traditional early stopping, PES can help to obtain more promising and stable results. Furthermore, by combining PES with existing approaches on noisy label training, we achieve state-of-the-art performance on image classification benchmarks. The code is made public at https: //github. com/tmllab/PES.

NeurIPS Conference 2021 Conference Paper

Universal Semi-Supervised Learning

  • Zhuo Huang
  • Chao Xue
  • Bo Han
  • Jian Yang
  • Chen Gong

Universal Semi-Supervised Learning (UniSSL) aims to solve the open-set problem where both the class distribution (i. e. , class set) and feature distribution (i. e. , feature domain) are different between labeled dataset and unlabeled dataset. Such a problem seriously hinders the realistic landing of classical SSL. Different from the existing SSL methods targeting at the open-set problem that only study one certain scenario of class distribution mismatch and ignore the feature distribution mismatch, we consider a more general case where a mismatch exists in both class and feature distribution. In this case, we propose a ''Class-shAring data detection and Feature Adaptation'' (CAFA) framework which requires no prior knowledge of the class relationship between the labeled dataset and unlabeled dataset. Particularly, CAFA utilizes a novel scoring strategy to detect the data in the shared class set. Then, it conducts domain adaptation to fully exploit the value of the detected class-sharing data for better semi-supervised consistency training. Exhaustive experiments on several benchmark datasets show the effectiveness of our method in tackling open-set problems.

IJCAI Conference 2020 Conference Paper

A Bi-level Formulation for Label Noise Learning with Spectral Cluster Discovery

  • Yijing Luo
  • Bo Han
  • Chen Gong

Practically, we often face the dilemma that some of the examples for training a classifier are incorrectly labeled due to various subjective and objective factors. Although intensive efforts have been put to design classifiers that are robust to label noise, most of the previous methods have not fully utilized data distribution information. To address this issue, this paper introduces a bi-level learning paradigm termed “Spectral Cluster Discovery'' (SCD) for combating with noisy labels. Namely, we simultaneously learn a robust classifier (Learning stage) by discovering the low-rank approximation to the ground-truth label matrix and learn an ideal affinity graph (Clustering stage). Specifically, we use the learned classifier to assign the examples with similar label to a mutual cluster. Based on the cluster membership, we utilize the learned affinity graph to explore the noisy examples based on the cluster membership. Both stages will reinforce each other iteratively. Experimental results on typical benchmark and real-world datasets verify the superiority of SCD to other label noise learning methods.

AAAI Conference 2020 Conference Paper

Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition Under Reshuffling

  • Chao Li
  • Mohammad Emtiyaz Khan
  • Zhun Sun
  • Gang Niu
  • Bo Han
  • Shengli Xie
  • Qibin Zhao

Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i. e. , image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

NeurIPS Conference 2020 Conference Paper

Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning

  • Yu Yao
  • Tongliang Liu
  • Bo Han
  • Mingming Gong
  • Jiankang Deng
  • Gang Niu
  • Masashi Sugiyama

The transition matrix, denoting the transition relationship from clean labels to noisy labels, is essential to build statistically consistent classifiers in label-noise learning. Existing methods for estimating the transition matrix rely heavily on estimating the noisy class posterior. However, the estimation error for noisy class posterior could be large because of the randomness of label noise. The estimation error would lead the transition matrix to be poorly estimated. Therefore in this paper, we aim to solve this problem by exploiting the divide-and-conquer paradigm. Specifically, we introduce an intermediate class to avoid directly estimating the noisy class posterior. By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimated transition matrices. We term the proposed method as the dual $T$-estimator. Both theoretical analyses and empirical results illustrate the effectiveness of the dual $T$-estimator for estimating transition matrices, leading to better classification performances.

NeurIPS Conference 2020 Conference Paper

Part-dependent Label Noise: Towards Instance-dependent Label Noise

  • Xiaobo Xia
  • Tongliang Liu
  • Bo Han
  • Nannan Wang
  • Mingming Gong
  • Haifeng Liu
  • Gang Niu
  • Dacheng Tao

Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise. Note that there are psychological and physiological evidences showing that we humans perceive instances by decomposing them into parts. Annotators are therefore more likely to annotate instances based on the parts rather than the whole instances, where a wrong mapping from parts to classes may cause the instance-dependent label noise. Motivated by this human cognition, in this paper, we approximate the instance-dependent label noise by exploiting \textit{part-dependent} label noise. Specifically, since instances can be approximately reconstructed by a combination of parts, we approximate the instance-dependent \textit{transition matrix} for an instance by a combination of the transition matrices for the parts of the instance. The transition matrices for parts can be learned by exploiting anchor points (i. e. , data points that belong to a specific class almost surely). Empirical evaluations on synthetic and real-world datasets demonstrate our method is superior to the state-of-the-art approaches for learning from the instance-dependent label noise.

NeurIPS Conference 2020 Conference Paper

Provably Consistent Partial-Label Learning

  • Lei Feng
  • Jiaqi Lv
  • Bo Han
  • Miao Xu
  • Gang Niu
  • Xin Geng
  • Bo An
  • Masashi Sugiyama

Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods - none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two PLL methods that are guaranteed to be provably consistent, i. e. , one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods.

NeurIPS Conference 2019 Conference Paper

Are Anchor Points Really Indispensable in Label-Noise Learning?

  • Xiaobo Xia
  • Tongliang Liu
  • Nannan Wang
  • Bo Han
  • Chen Gong
  • Gang Niu
  • Masashi Sugiyama

In label-noise learning, the \textit{noise transition matrix}, denoting the probabilities that clean labels flip into noisy labels, plays a central role in building \textit{statistically consistent classifiers}. Existing theories have shown that the transition matrix can be learned by exploiting \textit{anchor points} (i. e. , data points that belong to a specific class almost surely). However, when there are no anchor points, the transition matrix will be poorly learned, and those previously consistent classifiers will significantly degenerate. In this paper, without employing anchor points, we propose a \textit{transition-revision} ($T$-Revision) method to effectively learn transition matrices, leading to better classifiers. Specifically, to learn a transition matrix, we first initialize it by exploiting data points that are similar to anchor points, having high \textit{noisy class posterior probabilities}. Then, we modify the initialized matrix by adding a \textit{slack variable}, which can be learned and validated together with the classifier by using noisy data. Empirical results on benchmark-simulated and real-world label-noise datasets demonstrate that without using exact anchor points, the proposed method is superior to state-of-the-art label-noise learning methods.

IJCAI Conference 2019 Conference Paper

Towards Robust ResNet: A Small Step but a Giant Leap

  • Jingfeng Zhang
  • Bo Han
  • Laura Wynter
  • Bryan Kian Hsiang Low
  • Mohan Kankanhalli

This paper presents a simple yet principled approach to boosting the robustness of the residual network (ResNet) that is motivated by a dynamical systems perspective. Namely, a deep neural network can be interpreted using a partial differential equation, which naturally inspires us to characterize ResNet based on an explicit Euler method. This consequently allows us to exploit the step factor h in the Euler method to control the robustness of ResNet in both its training and generalization. In particular, we prove that a small step factor h can benefit its training and generalization robustness during backpropagation and forward propagation, respectively. Empirical evaluation on real-world datasets corroborates our analytical findings that a small h can indeed improve both its training and generalization robustness.

NeurIPS Conference 2018 Conference Paper

Co-teaching: Robust training of deep neural networks with extremely noisy labels

  • Bo Han
  • Quanming Yao
  • Xingrui Yu
  • Gang Niu
  • Miao Xu
  • Weihua Hu
  • Ivor Tsang
  • Masashi Sugiyama

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

NeurIPS Conference 2018 Conference Paper

Masking: A New Perspective of Noisy Supervision

  • Bo Han
  • Jiangchao Yao
  • Gang Niu
  • Mingyuan Zhou
  • Ivor Tsang
  • Ya Zhang
  • Masashi Sugiyama

It is important to learn various types of classifiers given training data with noisy labels. Noisy labels, in the most popular noise model hitherto, are corrupted from ground-truth labels by an unknown noise transition matrix. Thus, by estimating this matrix, classifiers can escape from overfitting those noisy labels. However, such estimation is practically difficult, due to either the indirect nature of two-step approaches, or not big enough data to afford end-to-end approaches. In this paper, we propose a human-assisted approach called ''Masking'' that conveys human cognition of invalid class transitions and naturally speculates the structure of the noise transition matrix. To this end, we derive a structure-aware probabilistic model incorporating a structure prior, and solve the challenges from structure extraction and structure alignment. Thanks to Masking, we only estimate unmasked noise transition probabilities and the burden of estimation is tremendously reduced. We conduct extensive experiments on CIFAR-10 and CIFAR-100 with three noise structures as well as the industrial-level Clothing1M with agnostic noise structure, and the results show that Masking can improve the robustness of classifiers significantly.

TIST Journal 2013 Journal Article

Lexical normalization for social media text

  • Bo Han
  • Paul Cook
  • Timothy Baldwin

Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this article, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalizing lexical variants. Our method uses a classifier to detect lexical variants, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.

EAAI Journal 2006 Journal Article

A statistical complement to deterministic algorithms for the retrieval of aerosol optical thickness from radiance data

  • Bo Han
  • Slobodan Vucetic
  • Amy Braverman
  • Zoran Obradovic

As a complement to the conventional deterministic geophysical algorithms, we consider a faster, but less accurate approach: training regression models to predict aerosol optical thickness (AOT) from radiance data. In our study, neural networks trained on a global data set are employed as a global retrieval method. Inverse distance spatial interpolation and region-specific neural networks trained on restricted, localized areas provide local models. We then develop two integrated statistical methods: local error correction of global retrievals and an optimal weighted average of global and local components. The algorithms are evaluated on the problem of deriving AOT from raw radiances observed by the Multi-angle Imaging SpectroRadiometer (MISR) instrument onboard NASA's Terra satellite. Integrated statistical approaches were clearly superior to global and local models alone. The best compromise between speed and accuracy was obtained through the weighted averaging of global neural networks and spatial interpolation. The results show that, while much faster, statistical retrievals can be quite comparable in accuracy to the far more computationally demanding deterministic methods. Differences in quality vary with season and model complexity.

IROS Conference 2006 Conference Paper

Intelligent Rotor Speed Controller for a Mini Autonomous Helicopter

  • Yu Xu
  • Ping Li 0057
  • Bo Han
  • Qinyuan Ren

Rotor is the most important component in mini autonomous helicopter. In order to simplify identification and control, it is desirable for mini autonomous helicopter to hold rotor speed constant, since varying rotor speed causes varying aerodynamics. An intelligent rotor speed controller has been developed, which adopts the control algorithms of feed forward and fuzzy tuned PI. The hardware design especially the tachometer design is described in this paper. To improve the performance of PI controller, a fuzzy logic supervisor has been proposed for tuning the gain parameters of PI controller online. The controller is then applied in a mini autonomous helicopter to verify its performance. The experimental results of real flights have shown that the intelligent rotor speed controller is superior in dynamic and steady state performance for its advantages of quick response and good robustness. Numerous flights have also proven its reliability