Arrow Research search

Author name cluster

Haobo Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

AAAI Conference 2026 Conference Paper

An Invariant Latent Space Perspective on Language Model Inversion

  • Wentao Ye
  • Jiaqi Hu
  • Haobo Wang
  • Xinpeng Ti
  • Zhiqing Xiao
  • Hao Chen
  • Liyao Li
  • Lei Feng

Language model inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) input output cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies.

AAAI Conference 2026 Conference Paper

AquaSplatting: A Hybrid 3D Representation for Robust Underwater Scene Reconstruction via Dual-Branch Rendering

  • Jiangbei Hu
  • Haobo Wang
  • Baixin Xu
  • Nan Ding
  • Zhimao Lu
  • Na Lei
  • Ying He

While 3D Gaussian Splatting (3DGS) excels at real-time rendering of standard scenes, it struggles to reconstruct underwater environments due to severe challenges such as light scattering, color attenuation, and sparse coverage of Gaussian kernels in far-field aqueous regions. To address this, we introduce AquaSplatting, a hybrid framework that combines explicit and implicit modeling methods for robust underwater scene reconstruction. Our dual-branch architecture employs 3DGS in a geometry-guided branch to model solid surfaces like the seabed, while a medium-aware branch uses a compact, view-dependent MLP to represent volumetric water effects. Furthermore, a neural underwater hybrid rendering mechanism adaptively fuses these two representations based on accumulated opacity. Thanks to this dual-branch framework, our method can also synthesize restored images without water medium. To enhance efficiency, our proposed engagement-based pruning (EBP) strategy quantifies each Gaussian's contribution by accumulating its image-space gradients over multiple frames, enabling the principled removal of primitives with negligible impact. The entire framework is optimized using a comprehensive loss function that integrates photometric, exposure, semantic, and depth priors to maximize visual fidelity. Experiments on challenging underwater datasets demonstrate that AquaSplatting achieves the state-of-the-art in reconstruction quality surpassing prior methods while maintaining real-time performance.

AAAI Conference 2026 Conference Paper

Dual Graph Disambiguation for Multi-Instance Partial-Label Learning

  • Zhen Zhu
  • Kai Tang
  • Songhe Feng
  • Yixuan Tang
  • Haobo Wang
  • Gengyu Lyu
  • Cheng Peng
  • Yining Sun

In multi-instance partial label learning (MIPL), each sample is a bag of multiple instances linked to a candidate label set containing one true and multiple false labels, yielding inexact supervision in both instance features and label space. However, existing works adopt decoupled approaches that focus exclusively on either instance-level feature fusion or label-level disambiguation, failing to fully exploit the intrinsic dependencies between these two spaces. To overcome this limitation, graph-based methods are widely recognized as a powerful paradigm in weakly supervised learning, yet their success hinges on reliable features—precisely what MIPL lacks due to instance-level noise. To bridge this gap, we propose DualG, a novel framework that simultaneously addresses feature learning and label disambiguation through dual-level graph propagation. Specifically, we construct dual relevance graphs at both the bag and instance levels. At the bag level, we build a similarity graph based on fused feature representations; at the instance level, we employ attention scores to filter out irrelevant instances and construct a reliable instance-level relevance graph. These complementary graphs enable our joint label disambiguation framework to simultaneously address inexact supervision signals in both instance space and label space. Experimental results on five benchmark datasets demonstrate that DualG outperforms existing MIPL and partial label learning methods, validating its effectiveness and superiority.

AAAI Conference 2026 Conference Paper

Group-aware Multiscale Ensemble Learning for Test-Time Multimodal Sentiment Analysis

  • Kai Tang
  • Yixuan Tang
  • Tianyi Chen
  • Haokai Xu
  • Qiqi Luo
  • Jin Guang Zheng
  • Zhixin Zhang
  • Gang Chen

Multi-modal Sentiment Analysis (MSA) enables machines to perceive human sentiments by integrating multiple modalities such as text, video, and audio. Despite recent progress, most existing methods assume distribution consistency between training and test data—a condition rarely met in real-world scenarios. To address domain shifts without relying on source data or target labels, Test-Time Adaptation (TTA) has emerged as a promising paradigm. However, applying TTA methods to MSA faces two challenges: a representation bottleneck inherent to the regression formulation and the inconsistency in modality fusion caused by modality-specific data augmentation techniques. To overcome these issues, we propose Group-aware Multiscale Ensemble Learning (GMEL), which leverages a von Mises-Fisher (vMF) mixture distribution to model latent sentiment groups and integrates a multi-scale re-dropout strategy for modality-agnostic feature augmentation, preserving fusion consistency. Extensive experiments on three benchmark datasets using two backbone architectures show that GMEL significantly outperforms existing baselines, demonstrating strong robustness to test-time distribution shifts in multi-modal sentiment analysis.

IJCAI Conference 2025 Conference Paper

A Timestep-Adaptive Frequency-Enhancement Framework for Diffusion-based Image Super-Resolution

  • Yueying Li
  • Hanbin Zhao
  • Jiaqing Zhou
  • Guozhi Xu
  • Tianlei Hu
  • Gang Chen
  • Haobo Wang

Image super-resolution (ISR) is a classic and challenging problem in computer vision because of complex and unknown degradation patterns in the data collection process. Leveraging powerful generative priors, diffusion-based methods have recently established new state-of-the-art ISR performance, but their characteristics in the frequency domain are still underexplored. In this paper, we innovatively investigate their frequency-domain behaviors from a sampling timestep perspective. Experimentally, we find that current diffusion-based ISR algorithms exhibit insufficiency in different frequency components in distinct groups of timesteps during the sampling. To address this, we first propose a Timestep Division Controller that is able to adaptively divide the timesteps into groups based on the performance gradient across different components. Next, we design two dedicated modules --- the Amplitude and Phase Enhancement Module (APEM) and the High- and Low-Frequency Enhancement Module (HLEM), to regulate the information flow of distinct frequency-domain features. By adaptively enhancing specific frequency components at different stages of the sampling process, the two modules effectively compensate for the insufficient frequency-domain perception of diffusion-based ISR models. Extensive experiments on three benchmark datasets verify the superior ISR performance of our method, e. g. , achieving an average 5. 40% improvement on CLIP-IQA compared to the best diffusion-based ISR baseline.

NeurIPS Conference 2025 Conference Paper

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

  • Zhifang Zhang
  • Shuo He
  • Haobo Wang
  • Bingquan Shen
  • Lei Feng

Multimodal contrastive learning models (e. g. , CLIP) can learn high-quality representations from large-scale image-text datasets, while they exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. In this paper, we reveal that CLIP's vulnerabilities primarily stem from its tendency to encode features beyond in-dataset predictive patterns, compromising its visual feature resistivity to input perturbations. This makes its encoded features highly susceptible to being reshaped by backdoor triggers. To address this challenge, we propose Repulsive Visual Prompt Tuning (RVPT), a novel defense approach that employs deep visual prompt tuning with a specially designed feature-repelling loss. Specifically, RVPT adversarially repels the encoded features from deeper layers while optimizing the standard cross-entropy loss, ensuring that only predictive features in downstream tasks are encoded, thereby enhancing CLIP’s visual feature resistivity against input perturbations and mitigating its susceptibility to backdoor attacks. Unlike existing multimodal backdoor defense methods that typically require the availability of poisoned data or involve fine-tuning the entire model, RVPT leverages few-shot downstream clean samples and only tunes a small number of parameters. Empirical results demonstrate that RVPT tunes only 0. 27\% of the parameters in CLIP, yet it significantly outperforms state-of-the-art defense methods, reducing the attack success rate from 89. 70\% to 2. 76\% against the most advanced multimodal attacks on ImageNet and effectively generalizes its defensive capabilities across multiple datasets. Our code is available on https: //anonymous. 4open. science/r/rvpt-anonymous.

NeurIPS Conference 2025 Conference Paper

Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection

  • Shenzhi Yang
  • Junbo Zhao
  • Sharon Li
  • Shouqing Yang
  • Dingyu Yang
  • Xiaofang Zhang
  • Haobo Wang

Out-of-distribution (OOD) node detection in graphs is a critical yet challenging task. Most existing approaches rely heavily on fine-grained labeled data to obtain a pre-trained supervised classifier, inherently assuming the existence of a well-defined pretext classification task. However, when such a task is ill-defined or absent, their applicability becomes severely limited. To overcome this limitation, there is an urgent need to propose a more scalable OOD detection method that is independent of both pretext tasks and label supervision. We harness a new phenomenon called Feature Resonance, focusing on the feature space rather than the label space. We observe that, ideally, during the optimization of known ID samples, unknown ID samples undergo more significant representation changes than OOD samples, even when the model is trained to align arbitrary targets. The rationale behind it is that even without gold labels, the local manifold may still exhibit smooth resonance. Based on this, we further develop a novel graph OOD framework, dubbed R esonance-based S eparation and L earning ( RSL ), which comprises two core modules: (i)-a more practical micro-level proxy of feature resonance that measures the movement of feature vectors in one training step. (ii)-integrate with a synthetic OOD node strategy to train an effective OOD classifier. Theoretically, we derive an error bound showing the superior separability of OOD nodes during the resonance period. Extensive experiments on a total of thirteen real-world graph datasets empirically demonstrate that RSL achieves state-of-the-art performance.

AAAI Conference 2025 Conference Paper

Multi-Instance Multi-Label Classification from Crowdsourced Labels

  • Ziquan Wang
  • Mingxuan Xia
  • Xiangyu Ren
  • Jiaqing Zhou
  • Gengyu Lyu
  • Tianlei Hu
  • Haobo Wang

Multi-instance multi-label classification (MIML) is a fundamental task in machine learning, where each data sample comprises a bag containing several instances and multiple binary labels. Despite its wide applications, the data collection process involves matching multiple instances and labels, typically resulting in high annotation costs. In this paper, we study a novel yet practical crowdsourced multi-instance multi-label classification (CMIML) setup, where labels are collected from multiple crowd sources. To address this problem, we first propose a novel data generation process for CMIML, i.e., cross-label transition, where cross-label annotation error is more likely to appear rather than previous single-label transition assumption, due to the inherent similarity of localized instances from different classes. Then, we formally define the cross-label transition by cross-label transition matrices which are dependent across classes. Subsequently, we establish the first unbiased risk estimator for CMIML and further improve it through aggregation techniques, along with a rigorous generalization error bound. We also provide a practical implementation of cross-label transition matrix estimation. Comprehensive experiments on six benchmark datasets under various scenarios demonstrate that our algorithm outperforms the baselines by a large margin, validating its effectiveness in handling the CMIML problem.

ICML Conference 2025 Conference Paper

P(all-atom) Is Unlocking New Path For Protein Design

  • Wei Qu
  • Jiawei Guan
  • Rui Ma
  • Ke Zhai 0008
  • Weikun Wu
  • Haobo Wang

We introduce Pallatom, an innovative protein generation model capable of producing protein structures with all-atom coordinates. Pallatom directly learns and models the joint distribution $P(\textit{structure}, \textit{seq})$ by focusing on $P(\textit{all-atom})$, effectively addressing the interdependence between sequence and structure in protein generation. To achieve this, we propose a novel network architecture specifically designed for all-atom protein generation. Our model employs a dual-track framework that tokenizes proteins into token-level and atomic-level representations, integrating them through a multi-layer decoding process with "traversing" representations and recycling mechanism. We also introduce the $\texttt{atom14}$ representation method, which unifies the description of unknown side-chain coordinates, ensuring high fidelity between the generated all-atom conformation and its physical structure. Experimental results demonstrate that Pallatom excels in key metrics of protein design, including designability, diversity, and novelty, showing significant improvements across the board. Our model not only enhances the accuracy of protein generation but also exhibits excellent sampling efficiency, paving the way for future applications in larger and more complex systems.

NeurIPS Conference 2025 Conference Paper

Table as a Modality for Large Language Models

  • Liyao Li
  • Chao Ye
  • Wentao Ye
  • Yifei Sun
  • Zhe Jiang
  • Haobo Wang
  • Jiaming Tian
  • Yiming Zhang

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42. 65%.

IJCAI Conference 2025 Conference Paper

Towards Robust Incremental Learning Under Ambiguous Supervision

  • Rui Wang
  • Mingxuan Xia
  • Haobo Wang
  • Lei Feng
  • Junbo Zhao
  • Gang Chen
  • Chang Yao

Traditional Incremental Learning (IL) targets to handle sequential fully-supervised learning problems where novel classes emerge from time to time. However, due to inherent annotation uncertainty and ambiguity, collecting high-quality annotated data in a dynamic learning system can be extremely expensive. To mitigate this problem, we propose a novel weakly-supervised learning paradigm called Incremental Partial Label Learning (IPLL), where the sequentially arrived data relate to a set of candidate labels rather than the ground truth. Technically, we develop the Prototype-Guided Disambiguation and Replay Algorithm (PGDR) which leverages the class prototypes as a proxy to mitigate two intertwined challenges in IPLL, i. e. , label ambiguity and catastrophic forgetting. To handle the former, PGDR encapsulates a momentum-based pseudo-labeling algorithm along with prototype-guided initialization, resulting in a balanced perception of classes. To alleviate forgetting, we develop a memory replay technique that collects well-disambiguated samples while maintaining representativeness and diversity. By jointly distilling knowledge from curated memory data, our framework exhibits a great disambiguation ability for samples of new tasks and achieves less forgetting of knowledge. Extensive experiments demonstrate that PGDR achieves superior performance over the baselines in the IPLL task.

AAAI Conference 2024 Conference Paper

A Separation and Alignment Framework for Black-Box Domain Adaptation

  • Mingxuan Xia
  • Junbo Zhao
  • Gengyu Lyu
  • Zenan Huang
  • Tianlei Hu
  • Gang Chen
  • Haobo Wang

Black-box domain adaptation (BDA) targets to learn a classifier on an unsupervised target domain while assuming only access to black-box predictors trained from unseen source data. Although a few BDA approaches have demonstrated promise by manipulating the transferred labels, they largely overlook the rich underlying structure in the target domain. To address this problem, we introduce a novel separation and alignment framework for BDA. Firstly, we locate those well-adapted samples via loss ranking and a flexible confidence-thresholding procedure. Then, we introduce a novel graph contrastive learning objective that aligns under-adapted samples to their local neighbors and well-adapted samples. Lastly, the adaptation is finally achieved by a nearest-centroid-augmented objective that exploits the clustering effect in the feature space. Extensive experiments demonstrate that our proposed method outperforms best baselines on benchmark datasets, e.g. improving the averaged per-class accuracy by 4.1% on the VisDA dataset. The source code is available at: https://github.com/MingxuanXia/SEAL.

IJCAI Conference 2024 Conference Paper

Common-Individual Semantic Fusion for Multi-View Multi-Label Learning

  • Gengyu Lyu
  • Weiqi Kang
  • Haobo Wang
  • Zheng Li
  • Zhen Yang
  • Songhe Feng

In Multi-View Multi-Label Learning, each instance is described by several heterogeneous features and associated with multiple valid labels simultaneously. Existing methods mainly focus on leveraging feature-level view fusion to capture a common representation for multi-label classifier induction. In this paper, we take a new perspective and propose a new semantic-level fusion model named Common-Individual Semantic Fusion Multi-View Multi-Label Learning Method (CISF). Different from previous feature-level fusion model, our proposed method directly focuses on semantic-level view fusion and simultaneously take both the common semantic across different views and the individual semantic of each specific view into consideration. Specifically, we first assume each view involves some common semantic labels while owns a few exclusive semantic labels. Then, the common and exclusive semantic labels are separately forced to be consensus and diverse to excavate the consistences and complementarities among different views. Afterwards, we introduce the low-rank and sparse constraint to highlight the label co-occurrence relationship of common semantics and the view-specific expression of individual semantics. We provide theoretical guarantee for the strict convexity of our method by properly setting parameters. Extensive experiments on various data sets have verified the superiority of our method.

NeurIPS Conference 2024 Conference Paper

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

  • Jianan Yang
  • Chenchao Gao
  • Zhiqing Xiao
  • Junbo Zhao
  • Sai Wu
  • Gang Chen
  • Haobo Wang

The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LoRA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the out-of-distribution concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The aesthetics score and concept-matching score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11. 10 boost on the CLIP score and a 33. 08% decrease on the CMMD metric.

TIST Journal 2024 Journal Article

Multiple-Instance Learning from Pairwise Comparison Bags

  • Shengjie Zhou
  • Senlin Shu
  • Haobo Wang
  • Hongxin Wei
  • Tao Xiang
  • Beibei Li

Multiple-instance learning (MIL) is a significant weakly supervised learning problem, where the training data consist of bags containing multiple instances and bag-level labels. Most previous MIL research required fully labeled bags. However, collecting such data is challenging due to the labeling costs or privacy concerns. Fortunately, we can easily collect pairwise comparison information, indicating one bag is more likely to be positive than the other. Therefore, we investigate a novel MIL problem about learning a bag-level binary classifier only from pairwise comparison bags. To solve this problem, we display the data generation process and provide a baseline method to train an instance-level classifier based on unlabeled-unlabeled learning. To achieve better performance, we propose a convex formulation to train a bag-level classifier and give a generalization error bound. Comprehensive experiments show that both the baseline method and the convex formulation achieve satisfactory performance, while the convex formulation performs better. 1

NeurIPS Conference 2024 Conference Paper

NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise

  • Zhonghao Wang
  • Danyu Sun
  • Sheng Zhou
  • Haobo Wang
  • Jiapei Fan
  • Longtao Huang
  • Jiajun Bu

Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https: //github. com/eaglelab-zju/NoisyGL.

AAAI Conference 2023 Conference Paper

A Generalized Unbiased Risk Estimator for Learning with Augmented Classes

  • Senlin Shu
  • Shuo He
  • Haobo Wang
  • Hongxin Wei
  • Tao Xiang
  • Lei Feng

In contrast to the standard learning paradigm where all classes can be observed in training data, learning with augmented classes (LAC) tackles the problem where augmented classes unobserved in the training data may emerge in the test phase. Previous research showed that given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees. However, this URE is only restricted to the specific type of one-versus-rest loss functions for multi-class classification, making it not flexible enough when the loss needs to be changed with the dataset in practice. In this paper, we propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees, given unlabeled data for LAC. To alleviate the issue of negative empirical risk commonly encountered by previous studies, we further propose a novel risk-penalty regularization term. Experiments demonstrate the effectiveness of our proposed method.

NeurIPS Conference 2023 Conference Paper

Debiased and Denoised Entity Recognition from Distant Supervision

  • Haobo Wang
  • Yiwen Dong
  • Ruixuan Xiao
  • Fei Huang
  • Gang Chen
  • Junbo Zhao

While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction. To cope with these problems, we propose a novel self-training framework, dubbed DesERT. This framework augments the conventional NER predicative pathway to a dual form that effectively adapts the sample-selection process to conform to its innate distributional-bias structure. The other crucial component of DesERT composes a debiased module aiming to enhance the token representations, hence the quality of the pseudo-labels. Extensive experiments are conducted to validate the DesERT. The results show that our framework establishes a new state-of-art performance, it achieves a +2. 22% average F1 score improvement on five standardized benchmarking datasets. Lastly, DesERT demonstrates its effectiveness under a new DSNER benchmark where additional distant supervision comes from the ChatGPT model.

IJCAI Conference 2023 Conference Paper

Deep Partial Multi-Label Learning with Graph Disambiguation

  • Haobo Wang
  • Shisong Yang
  • Gengyu Lyu
  • Weiwei Liu
  • Tianlei Hu
  • Ke Chen
  • Songhe Feng
  • Gang Chen

In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

Latent Processes Identification From Multi-View Time Series

  • Zenan Huang
  • Haobo Wang
  • Junbo Zhao
  • Nenggan Zheng

Understanding the dynamics of time series data typically requires identifying the unique latent factors for data generation, a. k. a. , latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series. The code is available on https: //github. com/lccurious/MuLTI.

IJCAI Conference 2023 Conference Paper

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

  • Ruixuan Xiao
  • Yiwen Dong
  • Haobo Wang
  • Lei Feng
  • Runze Wu
  • Gang Chen
  • Junbo Zhao

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2. 48% on the CIFAR-N dataset.

NeurIPS Conference 2023 Conference Paper

Regression with Cost-based Rejection

  • Xin Cheng
  • Yuzhou Cao
  • Haobo Wang
  • Hongxin Wei
  • Bo An
  • Lei Feng

Learning with rejection is an important framework that can refrain from making predictions to avoid critical mispredictions by balancing between prediction and rejection. Previous studies on cost-based rejection only focused on the classification setting, which cannot handle the continuous and infinite target space in the regression setting. In this paper, we investigate a novel regression problem called regression with cost-based rejection, where the model can reject to make predictions on some examples given certain rejection costs. To solve this problem, we first formulate the expected risk for this problem and then derive the Bayes optimal solution, which shows that the optimal model should reject to make predictions on the examples whose variance is larger than the rejection cost when the mean squared error is used as the evaluation metric. Furthermore, we propose to train the model by a surrogate loss function that considers rejection as binary classification and we provide conditions for the model consistency, which implies that the Bayes optimal solution can be recovered by our proposed surrogate loss. Extensive experiments demonstrate the effectiveness of our proposed method.

NeurIPS Conference 2023 Conference Paper

SPA: A Graph Spectral Alignment Perspective for Domain Adaptation

  • Zhiqing Xiao
  • Haobo Wang
  • Ying Jin
  • Lei Feng
  • Gang Chen
  • Fei Huang
  • Junbo Zhao

Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to extend the in-domain model to the distinctive target domains where the data distributions differ. Most prior works focus on capturing the inter-domain transferability but largely overlook rich intra-domain structures, which empirically results in even worse discriminability. In this work, we introduce a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The core of our method is briefly condensed as follows: (i)-by casting the DA problem to graph primitives, SPA composes a coarse graph alignment mechanism with a novel spectral regularizer towards aligning the domain graphs in eigenspaces; (ii)-we further develop a fine-grained message propagation module --- upon a novel neighbor-aware self-training mechanism --- in order for enhanced discriminability in the target domain. On standardized benchmarks, the extensive experiments of SPA demonstrate that its performance has surpassed the existing cutting-edge DA methods. Coupled with dense model analysis, we conclude that our approach indeed possesses superior efficacy, robustness, discriminability, and transferability. Code and data are available at: https: //github. com/CrownX/SPA.

NeurIPS Conference 2022 Conference Paper

Less-forgetting Multi-lingual Fine-tuning

  • Yuren Mao
  • Yaobo Liang
  • Nan Duan
  • Haobo Wang
  • Kai Wang
  • Lu Chen
  • Yunjun Gao

Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.

NeurIPS Conference 2022 Conference Paper

SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning

  • Haobo Wang
  • Mingxuan Xia
  • Yixuan Li
  • Yuren Mao
  • Lei Feng
  • Gang Chen
  • Junbo Zhao

Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https: //github. com/hbzju/SoLar.

IJCAI Conference 2020 Conference Paper

Collaboration Based Multi-Label Propagation for Fraud Detection

  • Haobo Wang
  • Zhao Li
  • Jiaming Huang
  • Pengrui Hui
  • Weiwei Liu
  • Tianlei Hu
  • Gang Chen

Detecting fraud users, who fraudulently promote certain target items, is a challenging issue faced by e-commerce platforms. Generally, many fraud users have different spam behaviors simultaneously, e. g. spam transactions, clicks, reviews and so on. Existing solutions have two main limitations: 1) the correlations among multiple spam behaviors are neglected; 2) large-scale computations are intractable when dealing with an enormous user set. To remedy these problems, this work proposes a collaboration based multi-label propagation (CMLP) algorithm. We first introduce a general-purpose version that involves collaboration technique to exploit label correlations. Specifically, it breaks the final prediction into two parts: 1) its own prediction part; 2) the prediction of others, i. e. collaborative part. Then, to accelerate it on large-scale e-commerce data, we propose a heterogeneous graph based variant that detects communities on the user-item graph directly. Both theoretical analysis and empirical results clearly validate the effectiveness and scalability of our proposals.

AAAI Conference 2020 Conference Paper

Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification

  • Haobo Wang
  • Chen Chen
  • Weiwei Liu
  • Ke Chen
  • Tianlei Hu
  • Gang Chen

Feature augmentation, which manipulates the feature space by integrating the label information, is one of the most popular strategies for solving Multi-Dimensional Classification (MDC) problems. However, the vanilla feature augmentation approaches fail to consider the intra-class exclusiveness, and may achieve degenerated performance. To fill this gap, a novel neural network based model is proposed which seamlessly integrates the Label Embedding and Feature Augmentation (LEFA) techniques to learn label correlations. Specifically, based on attentional factorization machine, a cross correlation aware network is introduced to learn a low-dimensional label representation that simultaneously depicts the inter-class correlations and the intra-class exclusiveness. Then the learned latent label vector can be used to augment the original feature space. Extensive experiments on seven real-world datasets demonstrate the superiority of LEFA over state-of-the-art MDC approaches.

IJCAI Conference 2020 Conference Paper

Learning From Multi-Dimensional Partial Labels

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Tianlei Hu
  • Ke Chen
  • Gang Chen

Multi-dimensional classification has attracted huge attention from the community. Though most studies consider fully annotated data, in real practice obtaining fully labeled data in MDC tasks is usually intractable. In this paper, we propose a novel learning paradigm: MultiDimensional Partial Label Learning (MDPL) where the ground-truth labels of each instance are concealed in multiple candidate label sets. We first introduce the partial hamming loss for MDPL that incurs a large loss if the predicted labels are not in candidate label sets, and provide an empirical risk minimization (ERM) framework. Theoretically, we rigorously prove the conditions for ERM learnability of MDPL in both independent and dependent cases. Furthermore, we present two MDPL algorithms under our proposed ERM framework. Comprehensive experiments on both synthetic and real-world datasets validate the effectiveness of our proposals.

IJCAI Conference 2019 Conference Paper

Discriminative and Correlative Partial Multi-Label Learning

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Chen Zhang
  • Tianlei Hu
  • Gang Chen

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.

AAAI Conference 2019 Conference Paper

Two-Stage Label Embedding via Neural Factorization Machine for Multi-Label Classification

  • Chen Chen
  • Haobo Wang
  • Weiwei Liu
  • Xingyuan Zhao
  • Tianlei Hu
  • Gang Chen

Label embedding has been widely used as a method to exploit label dependency with dimension reduction in multilabel classification tasks. However, existing embedding methods intend to extract label correlations directly, and thus they might be easily trapped by complex label hierarchies. To tackle this issue, we propose a novel Two-Stage Label Embedding (TSLE) paradigm that involves Neural Factorization Machine (NFM) to jointly project features and labels into a latent space. In encoding phase, we introduce a Twin Encoding Network (TEN) that digs out pairwise feature and label interactions in the first stage and then efficiently learn higherorder correlations with deep neural networks (DNNs) in the second stage. After the codewords are obtained, a set of hidden layers is applied to recover the output labels in decoding phase. Moreover, we develop a novel learning model by leveraging a max margin encoding loss and a label-correlation aware decoding loss, and we adopt the mini-batch Adam to optimize our learning model. Lastly, we also provide a kernel insight to better understand our proposed TSLE. Extensive experiments on various real-world datasets demonstrate that our proposed model significantly outperforms other state-ofthe-art approaches.