Author name cluster

Junping Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification

Zijiang Yang
Hanqing Chao
Bokai Zhao
Yelin Yang
Yunshuo Zhang
Dongmei Fu
Junping Zhang
Le Lu

Nucleus detection and classification (NDC) in histopathology analysis is a fundamental task that underpins a wide range of high-level pathology applications. However, existing methods heavily rely on labor-intensive nucleus-level annotations and struggle to fully exploit large-scale unlabeled data for learning discriminative nucleus representations. In this work, we propose MUSE (MUlti-scale denSE self-distillation), a novel self-supervised learning method tailored for NDC. At its core is NuLo (Nucleus-based Local self-distillation), a coordinate-guided mechanism that enables flexible local self-distillation based on predicted nucleus positions. By removing the need for strict spatial alignment between augmented views, NuLo allows critical cross-scale alignment, thus unlocking the capacity of models for fine-grained nucleus-level representation. To support MUSE, we design a simple yet effective encoder-decoder architecture and a large field-of-view semi-supervised fine-tuning strategy that together maximize the value of unlabeled pathology images. Extensive experiments on three widely used benchmarks demonstrate that MUSE effectively addresses the core challenges of histopathological NDC. The resulting models not only surpass state-of-the-art supervised baselines but also outperform generic pathology foundation models.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UniAPO: Unified Multimodal Automated Prompt Optimization

Qipeng zhu
Yanzhe Chen
Huasong Zhong
Jie Chen
Yan Li
Zhixin Zhang
Junping Zhang
Zhenheng Yang

Prompting is fundamental to unlocking the full potential of large language models. To automate and enhance this process, automatic prompt optimization (APO) has been developed, demonstrating effectiveness primarily in text-only input scenarios. However, extending existing APO methods to multimodal tasks—such as video-language generation—introduces two core challenges: (i) visual token inflation, where long visual-token sequences restrict context capacity and result in insufficient feedback signals; (ii) a lack of process-level supervision, as existing methods focus on outcome-level supervision and overlook intermediate supervision, limiting prompt optimization. We present UniAPO: Unified Multimodal Automated Prompt Optimization, the first framework tailored for multimodal APO. UniAPO adopts an EM-inspired optimization process that decouples feedback modeling and prompt refinement, making the optimization more stable and goal-driven. To further address the aforementioned challenges, we introduce a short-long term memory mechanism: historical feedback mitigates context limitations, while historical prompts provide directional guidance for effective prompt optimization. UniAPO achieves consistent gains across text, image, and video benchmarks, establishing a unified framework for efficient and transferable prompt optimization.

PDF Details DOI

ICRA Conference 2025 Conference Paper

A2DO: Adaptive Anti-Degradation Odometry with Deep Multi-Sensor Fusion for Autonomous Navigation

Hui Lai
Qi Chen
Junping Zhang
Jian Pu

Accurate localization is essential for the safe and effective navigation of autonomous vehicles, and Simultaneous Localization and Mapping (SLAM) is a cornerstone technology in this context. However, The performance of the SLAM system can deteriorate under challenging conditions such as low light, adverse weather, or obstructions due to sensor degradation. We present A2DO, a novel end-to-end multi-sensor fusion odometry system that enhances robustness in these scenarios through deep neural networks. A2DO integrates LiDAR and visual data, employing a multilayer, multi-scale feature encoding module augmented by an attention mechanism to mitigate sensor degradation dynamically. The system is pretrained extensively on simulated datasets covering a broad range of degradation scenarios and fine-tuned on a curated set of real-world data, ensuring robust adaptation to complex scenarios. Our experiments demonstrate that A2DO maintains superior localization accuracy and robustness across various degradation conditions, showcasing its potential for practical implementation in autonomous vehicle systems.

Details

NeurIPS Conference 2025 Conference Paper

GMV: A Unified and Efficient Graph Multi-View Learning Framework

Qipeng zhu
Jie Chen
Jian Pu
Junping Zhang

Graph Neural Networks (GNNs) are pivotal in graph classification but often struggle with generalization and overfitting. We introduce a unified and efficient Graph Multi-View (GMV) learning framework that integrates multi-view learning into GNNs to enhance robustness and efficiency. Leveraging the lottery ticket hypothesis, GMV activates diverse sub-networks within a single GNN through a novel training pipeline, which includes mixed-view generation, and multi-view decomposition and learning. This approach simultaneously broadens "views" from the data, model, and optimization perspectives during training to enhance the generalization capabilities of GNNs. During inference, GMV only incorporates additional prediction heads into standard GNNs, thereby achieving multi-view learning at minimal cost. Our experiments demonstrate that GMV surpasses other augmentation and ensemble techniques for GNNs and Graph Transformers across various graph classification scenarios.

PDF Details

NeurIPS Conference 2025 Conference Paper

InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention

Qiang Xiang
Shuang Sun
Binglei Li
Dejia Song
Huaxia Li
Yibo Chen
Xu Tang
Yao Hu

Diffusion models have demonstrated remarkable capabilities in generating high-quality images. Recent advancements in Layout-to-Image (L2I) generation have leveraged positional conditions and textual descriptions to facilitate precise and controllable image synthesis. Despite overall progress, current L2I methods still exhibit suboptimal performance. Therefore, we propose InstanceAssemble, a novel architecture that incorporates layout conditions via instance-assembling attention, enabling position control with bounding boxes (bbox) and multimodal content control including texts and additional visual content. Our method achieves flexible adaption to existing DiT-based T2I models through light-weighted LoRA modules. Additionally, we propose a Layout-to-Image benchmark, Denselayout, a comprehensive benchmark for layout-to-image generation, containing 5k images with 90k instances in total. We further introduce Layout Grounding Score (LGS), an interpretable evaluation metric to more precisely assess the accuracy of L2I generation. Experiments demonstrate that our InstanceAssemble method achieves state-of-the-art performance under complex layout conditions, while exhibiting strong compatibility with diverse style LoRA modules. The code and pretrained models are publicly available at \url{https: //github. com/FireRedTeam/InstanceAssemble}.

PDF Details

ICML Conference 2025 Conference Paper

PROTOCOL: Partial Optimal Transport-enhanced Contrastive Learning for Imbalanced Multi-view Clustering

Xuqian Xue
Yiming Lei
Qi Cai
Hongming Shan
Junping Zhang

While contrastive multi-view clustering has achieved remarkable success, it implicitly assumes balanced class distribution. However, real-world multi-view data primarily exhibits class imbalance distribution. Consequently, existing methods suffer performance degradation due to their inability to perceive and model such imbalance. To address this challenge, we present the first systematic study of imbalanced multi-view clustering, focusing on two fundamental problems: i. perceiving class imbalance distribution, and ii. mitigating representation degradation of minority samples. We propose PROTOCOL, a novel PaRtial Optimal TranspOrt-enhanced COntrastive Learning framework for imbalanced multi-view clustering. First, for class imbalance perception, we map multi-view features into a consensus space and reformulate the imbalanced clustering as a partial optimal transport (POT) problem, augmented with progressive mass constraints and weighted KL divergence for class distributions. Second, we develop a POT-enhanced class-rebalanced contrastive learning at both feature and class levels, incorporating logit adjustment and class-sensitive learning to enhance minority sample representations. Extensive experiments demonstrate that PROTOCOL significantly improves clustering performance on imbalanced multi-view data, filling a critical research gap in this field.

Details

NeurIPS Conference 2024 Conference Paper

Denoising Diffusion Path: Attribution Noise Reduction with An Auxiliary Diffusion Model

Yiming Lei
Zilong Li
Junping Zhang
Hongming Shan

The explainability of deep neural networks (DNNs) is critical for trust and reliability in AI systems. Path-based attribution methods, such as integrated gradients (IG), aim to explain predictions by accumulating gradients along a path from a baseline to the target image. However, noise accumulated during this process can significantly distort the explanation. While existing methods primarily concentrate on finding alternative paths to circumvent noise, they overlook a critical issue: intermediate-step images frequently diverge from the distribution of training data, further intensifying the impact of noise. This work presents a novel Denoising Diffusion Path (DDPath) to tackle this challenge by harnessing the power of diffusionmodels for denoising. By exploiting the inherent ability of diffusion models to progressively remove noise from an image, DDPath constructs a piece-wise linear path. Each segment of this path ensures that samples drawn from a Gaussian distribution are centered around the target image. This approach facilitates a gradual reduction of noise along the path. We further demonstrate that DDPath adheres to essential axiomatic properties for attribution methods and can be seamlessly integrated with existing methods such as IG. Extensive experimental results demonstrate that DDPath can significantly reduce noise in the attributions—resulting in clearer explanations—and achieves better quantitative results than traditional path-based methods.

PDF Details DOI

JBHI Journal 2024 Journal Article

HOPE: Hybrid-Granularity Ordinal Prototype Learning for Progression Prediction of Mild Cognitive Impairment

Chenhui Wang
Yiming Lei
Tao Chen
Junping Zhang
Yuxin Li
Hongming Shan

Mild cognitive impairment (MCI) is often at high risk of progression to Alzheimer's disease (AD). Existing works to identify the progressive MCI (pMCI) typically require MCI subtype labels, pMCI vs. stable MCI (sMCI), determined by whether or not an MCI patient will progress to AD after a long follow-up. However, prospectively acquiring MCI subtype data is time-consuming and resource-intensive; the resultant small datasets could lead to severe overfitting and difficulty in extracting discriminative information. Inspired by that various longitudinal biomarkers and cognitive measurements present an ordinal pathway on AD progression, we propose a novel Hybrid-granularity Ordinal PrototypE learning (HOPE) method to characterize AD ordinal progression for MCI progression prediction. First, HOPE learns an ordinal metric space that enables progression prediction by prototype comparison. Second, HOPE leverages a novel hybrid-granularity ordinal loss to learn the ordinal nature of AD via effectively integrating instance-to-instance ordinality, instance-to-class compactness, and class-to-class separation. Third, to make the prototype learning more stable, HOPE employs an exponential moving average strategy to learn the global prototypes of NC and AD dynamically. Experimental results on the internal ADNI and the external NACC datasets demonstrate the superiority of the proposed HOPE over existing state-of-the-art methods as well as its interpretability.

Details DOI

TMLR Journal 2024 Journal Article

SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

Jie Chen
Mingyuan Bai
Shouzhen Chen
Junbin Gao
Junping Zhang
Jian Pu

The recursive node fetching and aggregation in message-passing cause inference latency when deploying Graph Neural Networks (GNNs) to large-scale graphs. One promising inference acceleration direction is to distill GNNs into message-passing-free student Multi-Layer Perceptrons (MLPs). However, the MLP student without graph dependency cannot fully learn the structure knowledge from GNNs, which causes inferior performance in heterophilic and online scenarios. To address this problem, we first design a simple yet effective Structure-Aware MLP (SA-MLP) as a student model. It utilizes linear layers as encoders and decoders to capture features and graph structures without message-passing among nodes. Furthermore, we introduce a novel structure-mixing knowledge distillation technique. It generates virtual samples imbued with a hybrid of structure knowledge from teacher GNNs, thereby enhancing the learning ability of MLPs for structure information. Extensive experiments on eight benchmark datasets under both transductive and online settings show that our SA-MLP can consistently achieve similar or even better results than teacher GNNs while maintaining as fast inference speed as MLPs. Our findings reveal that SA-MLP efficiently assimilates graph knowledge through distillation from GNNs in an end-to-end manner, eliminating the need for complex model architectures and preprocessing of features/structures. Our code is available at https://github.com/JC-202/SA-MLP.

PDF Details

NeurIPS Conference 2023 Conference Paper

LICO: Explainable Models with Language-Image COnsistency

Yiming Lei
Zilong Li
Yangyang Li
Junping Zhang
Hongming Shan

Interpreting the decisions of deep learning models has been actively studied since the explosion of deep neural networks. One of the most convincing interpretation approaches is salience-based visual interpretation, such as Grad-CAM, where the generation of attention maps depends merely on categorical labels. Although existing interpretation methods can provide explainable decision clues, they often yield partial correspondence between image and saliency maps due to the limited discriminative information from one-hot labels. This paper develops a Language-Image COnsistency model for explainable image classification, termed LICO, by correlating learnable linguistic prompts with corresponding visual features in a coarse-to-fine manner. Specifically, we first establish a coarse global manifold structure alignment by minimizing the distance between the distributions of image and language features. We then achieve fine-grained saliency maps by applying optimal transport (OT) theory to assign local feature maps with class-specific prompts. Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference.

PDF Details

IS Journal 2022 Journal Article

Robust Precipitation Bias Correction Through an Ordinal Distribution Autoencoder

Youcheng Luo
Xiaoyang Xu
Yiqun Liu
Hanqing Chao
Hai Chu
Lei Chen
Junping Zhang
Leiming Ma

Numerical precipitation prediction plays a crucial role in weather forecasting and has broad applications in public services including aviation management and urban disaster early-warning systems. However, numerical weather prediction (NWP) models are often constrained by a systematic bias due to coarse spatial resolution, lack of parameterizations, and limitations of observation and conventional meteorological models, including constrained sample size and long-tail distribution. To address these issues, we present a data-driven deep learning model, named the ordinal distribution autoencoder (ODA), which principally includes a precipitation confidence network and a combinatorial network that contains two blocks, i. e. , a denoising autoencoder block and an ordinal distribution regression block. As an expert-free model for bias correction of precipitation, it can effectively correct numerical precipitation prediction based on meteorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) and SMS-WARMS, an NWP model used in East China. Experiments in the two NWP models demonstrate that, compared with several classical machine-learning algorithms and deep learning models, our proposed ODA generally performs better in bias correction.

Details DOI

IJCAI Conference 2021 Conference Paper

AgeFlow: Conditional Age Progression and Regression with Normalizing Flows

Zhizhong Huang
Shouzhen Chen
Junping Zhang
Hongming Shan

Age progression and regression aim to synthesize photorealistic appearance of a given face image with aging and rejuvenation effects, respectively. Existing generative adversarial networks (GANs) based methods suffer from the following three major issues: 1) unstable training introducing strong ghost artifacts in the generated faces, 2) unpaired training leading to unexpected changes in facial attributes such as genders and races, and 3) non-bijective age mappings increasing the uncertainty in the face transformation. To overcome these issues, this paper proposes a novel framework, termed AgeFlow, to integrate the advantages of both flow-based models and GANs. The proposed AgeFlow contains three parts: an encoder that maps a given face to a latent space through an invertible neural network, a novel invertible conditional translation module (ICTM) that translates the source latent vector to target one, and a decoder that reconstructs the generated face from the target latent vector using the same encoder network; all parts are invertible achieving bijective age mappings. The novelties of ICTM are two-fold. First, we propose an attribute-aware knowledge distillation to learn the manipulation direction of age progression while keeping other unrelated attributes unchanged, alleviating unexpected changes in facial attributes. Second, we propose to use GANs in the latent space to ensure the learned latent vector indistinguishable from the real ones, which is much easier than traditional use of GANs in the image domain. Experimental results demonstrate superior performance over existing GANs-based methods on two benchmarked datasets. The source code is available at https: //github. com/Hzzone/AgeFlow.

PDF Details DOI

ICLR Conference 2021 Conference Paper

Private Image Reconstruction from System Side Channels Using Generative Models

Yuanyuan Yuan 0001
Shuai Wang 0011
Junping Zhang

System side channels denote effects imposed on the underlying system and hardware when running a program, such as its accessed CPU cache lines. Side channel analysis (SCA) allows attackers to infer program secrets based on observed side channel signals. Given the ever-growing adoption of machine learning as a service (MLaaS), image analysis software on cloud platforms has been exploited by reconstructing private user images from system side channels. Nevertheless, to date, SCA is still highly challenging, requiring technical knowledge of victim software's internal operations. For existing SCA attacks, comprehending such internal operations requires heavyweight program analysis or manual efforts. This research proposes an attack framework to reconstruct private user images processed by media software via system side channels. The framework forms an effective workflow by incorporating convolutional networks, variational autoencoders, and generative adversarial networks. Our evaluation of two popular side channels shows that the reconstructed images consistently match user inputs, making privacy leakage attacks more practical. We also show surprising results that even one-bit data read/write pattern side channels, which are deemed minimally informative, can be used to reconstruct quality images using our framework.

Details

IJCAI Conference 2021 Conference Paper

Self-boosting for Feature Distillation

Yulong Pei
Yanyun Qu
Junping Zhang

Knowledge distillation is a simple but effective method for model compression, which obtains a better-performing small network (Student) by learning from a well-trained large network (Teacher). However, when the difference in the model sizes of Student and Teacher is large, the gap in capacity leads to poor performance of Student. Existing methods focus on seeking simplified or more effective knowledge from Teacher to narrow the Teacher-Student gap, while we address this problem by Student's self-boosting. Specifically, we propose a novel distillation method named Self-boosting Feature Distillation (SFD), which eases the Teacher-Student gap by feature integration and self-distillation of Student. Three different modules are designed for feature integration to enhance the discriminability of Student's feature, which leads to improving the order of convergence in theory. Moreover, an easy-to-operate self-distillation strategy is put forward to stabilize the training process and promote the performance of Student, without additional forward propagation or memory consumption. Extensive experiments on multiple benchmarks and networks show that our method is significantly superior to existing methods.

PDF Details DOI

AAAI Conference 2019 Conference Paper

GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition

Hanqing Chao
Yiwei He
Junping Zhang
Jianfeng Feng

As a unique biometric feature that can be recognized at a distance, gait has broad applications in crime prevention, forensic identification and social security. To portray a gait, existing gait recognition methods utilize either a gait template, where temporal information is hard to preserve, or a gait sequence, which must keep unnecessary sequential constraints and thus loses the flexibility of gait recognition. In this paper we present a novel perspective, where a gait is regarded as a set consisting of independent frames. We propose a new network named GaitSet to learn identity information from the set. Based on the set perspective, our method is immune to permutation of frames, and can naturally integrate frames from different videos which have been filmed under different scenarios, such as diverse viewing angles, different clothes/carrying conditions. Experiments show that under normal walking conditions, our single-model method achieves an average rank-1 accuracy of 95. 0% on the CASIA-B gait dataset and an 87. 1% accuracy on the OU-MVLP gait dataset. These results represent new state-of-the-art recognition accuracy. On various complex scenarios, our model exhibits a significant level of robustness. It achieves accuracies of 87. 2% and 70. 4% on CASIA-B under bag-carrying and coat-wearing walking conditions, respectively. These outperform the existing best methods by a large margin. The method presented can also achieve a satisfactory accuracy with a small number of frames in a test sample, e. g. , 82. 5% on CASIA-B with only 7 frames. The source code has been released at https: //github. com/AbnerHqC/GaitSet.

PDF Details

ECAI Conference 2016 Conference Paper

Randomized Distribution Feature for Image Classification

Hongming Shan
Junping Zhang

Local image features can be assumed to be drawn from an unknown distribution. For image classification, such features are compared through the histogram-based model or the metric-based model. By quantizing these local features into a set of histograms, the histogram-based model is convenient and has vectorial representation of image but information could be lost in vector quantization. Unlike the histogram-based model, the metric-based model estimates the metrics over the underlying distribution of local features immediately, achieving better predictive performance. However, the model requires higher computational cost and loses the benefit of vectorial representation of image.

Details

ICML Conference 2013 Conference Paper

Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment

Billy Chang
Uwe Krüger 0001
Rafal Kustra
Junping Zhang

Canonical correlation analysis (CCA) is a well established technique for identifying linear relationships among two variable sets. Kernel CCA (KCCA) is the most notable nonlinear extension but it lacks interpretability and robustness against irrelevant features. The aim of this article is to introduce two nonlinear CCA extensions that rely on the recently proposed Hilbert-Schmidt independence criterion and the centered kernel target alignment. These extensions determine linear projections that provide maximally dependent projected data pairs. The paper demonstrates that the use of linear projections allows removing irrelevant features, whilst extracting combinations of strongly associated features. This is exemplified through a simulation and the analysis of recorded data that are available in the literature.

Details

IS Journal 2010 Journal Article

Manifold Learning for Visualizing and Analyzing High-dimensional Data

Junping Zhang
Hua Huang
Jue Wang

Assuming that high-dimensional data are generated from intrinsic variables with lower dimensions, several key manifold-learning algorithms can help effectively analyze and visualize such data.

Details DOI