Author name cluster

Junbo Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

AAAI Conference 2026 Conference Paper

An Invariant Latent Space Perspective on Language Model Inversion

Wentao Ye
Jiaqi Hu
Haobo Wang
Xinpeng Ti
Zhiqing Xiao
Hao Chen
Liyao Li
Lei Feng

Language model inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) input output cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection

Shenzhi Yang
Junbo Zhao
Sharon Li
Shouqing Yang
Dingyu Yang
Xiaofang Zhang
Haobo Wang

Out-of-distribution (OOD) node detection in graphs is a critical yet challenging task. Most existing approaches rely heavily on fine-grained labeled data to obtain a pre-trained supervised classifier, inherently assuming the existence of a well-defined pretext classification task. However, when such a task is ill-defined or absent, their applicability becomes severely limited. To overcome this limitation, there is an urgent need to propose a more scalable OOD detection method that is independent of both pretext tasks and label supervision. We harness a new phenomenon called Feature Resonance, focusing on the feature space rather than the label space. We observe that, ideally, during the optimization of known ID samples, unknown ID samples undergo more significant representation changes than OOD samples, even when the model is trained to align arbitrary targets. The rationale behind it is that even without gold labels, the local manifold may still exhibit smooth resonance. Based on this, we further develop a novel graph OOD framework, dubbed R esonance-based S eparation and L earning ( RSL ), which comprises two core modules: (i)-a more practical micro-level proxy of feature resonance that measures the movement of feature vectors in one training step. (ii)-integrate with a synthetic OOD node strategy to train an effective OOD classifier. Theoretically, we derive an error bound showing the superior separability of OOD nodes during the resonance period. Extensive experiments on a total of thirteen real-world graph datasets empirically demonstrate that RSL achieves state-of-the-art performance.

PDF Details

IJCAI Conference 2025 Conference Paper

POLO: An LLM-Powered Project-Level Code Performance Optimization Framework

Jiameng Bai
Ruoyi Xu
Sai Wu
Dingyu Yang
Junbo Zhao
Gang Chen

Program performance optimization is essential for achieving high execution efficiency, yet it remains a challenging task that requires expertise in both software and hardware. Large Language Models (LLMs), trained on high-quality code from platforms like GitHub and other open-source sources, have shown promise in generating optimized code for simple snippets. However, current LLM-based solutions often fall short when tackling project-level programs due to the complexity of call graphs and the intricate interactions among functions. In this paper, we emulate the process a human expert might follow when optimizing project-level programs and introduce a three-phase framework POLO (PrOject-Level Optimizer) to address this limitation. First, we profile the program to identify performance bottlenecks using an iterative weighting algorithm. Next, we conduct structural analysis by scanning the project and generating a graph that represents the program's structure. Finally, two LLM agents collaborate in iterative cycles to rewrite and optimize the code at these hotspots, gradually improving performance. We conduct experiments on open-source and proprietary projects. The results demonstrate that POLO accurately identifies performance bottlenecks and successfully applies optimizations. Under the O3 compilation flag, the optimized programs achieved speedups ranging from 1. 34x to 21. 5x.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo
Weian Mao
Yanjun Shao
Weiqiang Bai
Peng Ye 0006
Xinzhu Ma
Junbo Zhao
Hao Chen 0041

In recent years, A variety of methods based on Transformer and state space model (SSM) architectures have been proposed, advancing foundational DNA language models. However, there is a lack of comparison between these recent approaches and the classical architecture—convolutional networks (CNNs)—on foundation model benchmarks. This raises the question: are CNNs truly being surpassed by these recent approaches based on transformer and SSM architectures? In this paper, we develop a simple but well-designed CNN-based method, termed ConvNova. ConvNova identifies and proposes three effective designs: 1) dilated convolutions, 2) gated convolutions, and 3) a dual-branch framework for gating mechanisms. Through extensive empirical experiments, we demonstrate that ConvNova significantly outperforms recent methods on more than half of the tasks across several foundation model benchmarks. For example, in histone-related tasks, ConvNova exceeds the second-best method by an average of 5.8\%, while generally utilizing fewer parameters and enabling faster computation. In addition, the experiments observed findings that may be related to biological characteristics. This indicates that CNNs are still a strong competitor compared to Transformers and SSMs. We anticipate that this work will spark renewed interest in CNN-based methods for DNA foundation models.

Details

NeurIPS Conference 2025 Conference Paper

Table as a Modality for Large Language Models

Liyao Li
Chao Ye
Wentao Ye
Yifei Sun
Zhe Jiang
Haobo Wang
Jiaming Tian
Yiming Zhang

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42. 65%.

PDF Details

IJCAI Conference 2025 Conference Paper

Towards Robust Incremental Learning Under Ambiguous Supervision

Rui Wang
Mingxuan Xia
Haobo Wang
Lei Feng
Junbo Zhao
Gang Chen
Chang Yao

Traditional Incremental Learning (IL) targets to handle sequential fully-supervised learning problems where novel classes emerge from time to time. However, due to inherent annotation uncertainty and ambiguity, collecting high-quality annotated data in a dynamic learning system can be extremely expensive. To mitigate this problem, we propose a novel weakly-supervised learning paradigm called Incremental Partial Label Learning (IPLL), where the sequentially arrived data relate to a set of candidate labels rather than the ground truth. Technically, we develop the Prototype-Guided Disambiguation and Replay Algorithm (PGDR) which leverages the class prototypes as a proxy to mitigate two intertwined challenges in IPLL, i. e. , label ambiguity and catastrophic forgetting. To handle the former, PGDR encapsulates a momentum-based pseudo-labeling algorithm along with prototype-guided initialization, resulting in a balanced perception of classes. To alleviate forgetting, we develop a memory replay technique that collects well-disambiguated samples while maintaining representativeness and diversity. By jointly distilling knowledge from curated memory data, our framework exhibits a great disambiguation ability for samples of new tasks and achieves less forgetting of knowledge. Extensive experiments demonstrate that PGDR achieves superior performance over the baselines in the IPLL task.

PDF Details DOI

AAAI Conference 2024 Conference Paper

A Separation and Alignment Framework for Black-Box Domain Adaptation

Mingxuan Xia
Junbo Zhao
Gengyu Lyu
Zenan Huang
Tianlei Hu
Gang Chen
Haobo Wang

Black-box domain adaptation (BDA) targets to learn a classifier on an unsupervised target domain while assuming only access to black-box predictors trained from unseen source data. Although a few BDA approaches have demonstrated promise by manipulating the transferred labels, they largely overlook the rich underlying structure in the target domain. To address this problem, we introduce a novel separation and alignment framework for BDA. Firstly, we locate those well-adapted samples via loss ranking and a flexible confidence-thresholding procedure. Then, we introduce a novel graph contrastive learning objective that aligns under-adapted samples to their local neighbors and well-adapted samples. Lastly, the adaptation is finally achieved by a nearest-centroid-augmented objective that exploits the clustering effect in the feature space. Extensive experiments demonstrate that our proposed method outperforms best baselines on benchmark datasets, e.g. improving the averaged per-class accuracy by 4.1% on the VisDA dataset. The source code is available at: https://github.com/MingxuanXia/SEAL.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

Jianan Yang
Chenchao Gao
Zhiqing Xiao
Junbo Zhao
Sai Wu
Gang Chen
Haobo Wang

The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LoRA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the out-of-distribution concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The aesthetics score and concept-matching score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11. 10 boost on the CLIP score and a 33. 08% decrease on the CMMD metric.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Debiased and Denoised Entity Recognition from Distant Supervision

Haobo Wang
Yiwen Dong
Ruixuan Xiao
Fei Huang
Gang Chen
Junbo Zhao

While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction. To cope with these problems, we propose a novel self-training framework, dubbed DesERT. This framework augments the conventional NER predicative pathway to a dual form that effectively adapts the sample-selection process to conform to its innate distributional-bias structure. The other crucial component of DesERT composes a debiased module aiming to enhance the token representations, hence the quality of the pseudo-labels. Extensive experiments are conducted to validate the DesERT. The results show that our framework establishes a new state-of-art performance, it achieves a +2. 22% average F1 score improvement on five standardized benchmarking datasets. Lastly, DesERT demonstrates its effectiveness under a new DSNER benchmark where additional distant supervision comes from the ChatGPT model.

PDF Details

AAAI Conference 2023 Conference Paper

Dynamic Ensemble of Low-Fidelity Experts: Mitigating NAS “Cold-Start”

Junbo Zhao
Xuefei Ning
Enshu Liu
Binxin Ru
Zixuan Zhou
Tianchen Zhao
Chen Chen
Jiajin Zhang

Predictor-based Neural Architecture Search (NAS) employs an architecture performance predictor to improve the sample efficiency. However, predictor-based NAS suffers from the severe ``cold-start'' problem, since a large amount of architecture-performance data is required to get a working predictor. In this paper, we focus on exploiting information in cheaper-to-obtain performance estimations (i.e., low-fidelity information) to mitigate the large data requirements of predictor training. Despite the intuitiveness of this idea, we observe that using inappropriate low-fidelity information even damages the prediction ability and different search spaces have different preferences for low-fidelity information types. To solve the problem and better fuse beneficial information provided by different types of low-fidelity information, we propose a novel dynamic ensemble predictor framework that comprises two steps. In the first step, we train different sub-predictors on different types of available low-fidelity information to extract beneficial knowledge as low-fidelity experts. In the second step, we learn a gating network to dynamically output a set of weighting coefficients conditioned on each input neural architecture, which will be used to combine the predictions of different low-fidelity experts in a weighted sum. The overall predictor is optimized on a small set of actual architecture-performance data to fuse the knowledge from different low-fidelity experts to make the final prediction. We conduct extensive experiments across five search spaces with different architecture encoders under various experimental settings. For example, our methods can improve the Kendall's Tau correlation coefficient between actual performance and predicted scores from 0.2549 to 0.7064 with only 25 actual architecture-performance data on NDS-ResNet. Our method can easily be incorporated into existing predictor-based NAS frameworks to discover better architectures. Our method will be implemented in Mindspore (Huawei 2020), and the example code is published at https://github.com/A-LinCui/DELE.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Latent Processes Identification From Multi-View Time Series

Zenan Huang
Haobo Wang
Junbo Zhao
Nenggan Zheng

Understanding the dynamics of time series data typically requires identifying the unique latent factors for data generation, a. k. a. , latent processes identification. Driven by the independent assumption, existing works have made great progress in handling single-view data. However, it is a non-trivial problem that extends them to multi-view time series data because of two main challenges: (i) the complex data structure, such as temporal dependency, can result in violation of the independent assumption; (ii) the factors from different views are generally overlapped and are hard to be aggregated to a complete set. In this work, we propose a novel framework MuLTI that employs the contrastive learning technique to invert the data generative process for enhanced identifiability. Additionally, MuLTI integrates a permutation mechanism that merges corresponding overlapped variables by the establishment of an optimal transport formula. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of our method in recovering identifiable latent variables on multi-view time series. The code is available on https: //github. com/lccurious/MuLTI.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

Ruixuan Xiao
Yiwen Dong
Haobo Wang
Lei Feng
Runze Wu
Gang Chen
Junbo Zhao

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2. 48% on the CIFAR-N dataset.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

SPA: A Graph Spectral Alignment Perspective for Domain Adaptation

Zhiqing Xiao
Haobo Wang
Ying Jin
Lei Feng
Gang Chen
Fei Huang
Junbo Zhao

Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to extend the in-domain model to the distinctive target domains where the data distributions differ. Most prior works focus on capturing the inter-domain transferability but largely overlook rich intra-domain structures, which empirically results in even worse discriminability. In this work, we introduce a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The core of our method is briefly condensed as follows: (i)-by casting the DA problem to graph primitives, SPA composes a coarse graph alignment mechanism with a novel spectral regularizer towards aligning the domain graphs in eigenspaces; (ii)-we further develop a fine-grained message propagation module --- upon a novel neighbor-aware self-training mechanism --- in order for enhanced discriminability in the target domain. On standardized benchmarks, the extensive experiments of SPA demonstrate that its performance has surpassed the existing cutting-edge DA methods. Coupled with dense model analysis, we conclude that our approach indeed possesses superior efficacy, robustness, discriminability, and transferability. Code and data are available at: https: //github. com/CrownX/SPA.

PDF Details

NeurIPS Conference 2022 Conference Paper

SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning

Haobo Wang
Mingxuan Xia
Yixuan Li
Yuren Mao
Lei Feng
Gang Chen
Junbo Zhao

Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https: //github. com/hbzju/SoLar.

PDF Details

NeurIPS Conference 2022 Conference Paper

TA-GATES: An Encoding Scheme for Neural Network Architectures

Xuefei Ning
Zixuan Zhou
Junbo Zhao
Tianchen Zhao
Yiping Deng
Changcheng Tang
Shuang Liang
Huazhong Yang

Neural architecture search tries to shift the manual design of neural network (NN) architectures to algorithmic design. In these cases, the NN architecture itself can be viewed as data and needs to be modeled. A better modeling could help explore novel architectures automatically and open the black box of automated architecture design. To this end, this work proposes a new encoding scheme for neural architectures, the Training-Analogous Graph-based ArchiTecture Encoding Scheme (TA-GATES). TA-GATES encodes an NN architecture in a way that is analogous to its training. Extensive experiments demonstrate that the flexibility and discriminative power of TA-GATES lead to better modeling of NN architectures. We expect our methodology of explicitly modeling the NN training process to benefit broader automated deep learning systems. The code is available at https: //github. com/walkerning/aw_nas.

PDF Details

IROS Conference 2021 Conference Paper

Semi-supervised Vein Segmentation of Ultrasound Images for Autonomous Venipuncture

Yu Chen
Yuxuan Wang
Bolin Lai
Zijie Chen
Xu Cao
Nanyang Ye 0001
Zhongyuan Ren
Junbo Zhao

Venipuncture is an indispensable procedure for both diagnosis and treatment. In this paper, unlike existing solutions that fully or partially rely on professional assistance, a compact robotic system integrating both novel hardware and software developments is introduced. The hardware consists of a set of units to facilitate the supporting, positioning, puncturing, and imaging functionalities. To achieve full automation, a novel deep learning framework — semi-ResNeXt-Unet for semi-supervised vein segmentation from ultrasound images is proposed. The depth information of vein is calculated and enables the automated navigation for the puncturing unit. The algorithm is validated on 40 volunteers, and the proposed semi-ResNeXt-Unet improves the dice similarity coefficient (DSC) by 5. 36%, decreases the centroid error by 1. 38 pixels and decreases the failure rate by 5. 60%, compared to fully-supervised ResNeXt-Unet.

Details

NeurIPS Conference 2019 Conference Paper

Levenshtein Transformer

Jiatao Gu
Changhan Wang
Junbo Zhao

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion. The combination of them facilitates not only generation but also sequence refinement allowing dynamic length changes. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable or even better performance with much-improved efficiency on both generation (e. g. machine translation, text summarization) and refinement tasks (e. g. automatic post-editing). We further confirm the flexibility of our model by showing a Levenshtein Transformer trained by machine translation can straightforwardly be used for automatic post-editing.

PDF Details

NeurIPS Conference 2016 Conference Paper

Disentangling factors of variation in deep representation using adversarial training

Michael Mathieu
Junbo Jake Zhao
Junbo Zhao
Aditya Ramesh
Pablo Sprechmann
Yann LeCun

We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e. g. different objects observed in the exact same conditions. This assumption is often not encountered in realistic settings where data acquisition is not controlled and labels for the uninformative components are not available. In this work, we propose to overcome this limitation by augmenting deep convolutional autoencoders with a form of adversarial training. Both factors of variation are implicitly captured in the organization of the learned embedding space, and can be used for solving single-image analogies. Experimental results on synthetic and real datasets show that the proposed method is capable of disentangling the influences of style and content factors using a flexible representation, as well as generalizing to unseen styles or content classes.

PDF Details

NeurIPS Conference 2015 Conference Paper

Character-level Convolutional Networks for Text Classification

Xiang Zhang
Junbo Zhao
Yann LeCun

This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

PDF Details