Author name cluster

Luping Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

AAAI Conference 2026 Conference Paper

EndoIR: Degradation-Agnostic All-in-One Endoscopic Image Restoration via Noise-Aware Routing Diffusion

Tong Chen
Xinyu Ma
Long Bai
Wenyang Wang
Yue Sun
Luping Zhou

Endoscopic images often suffer from diverse and co-occurring degradations such as low lighting, smoke, and bleeding, which obscure critical clinical details. Existing restoration methods are typically task-specific and often require prior knowledge of the degradation type, limiting their robustness in real-world clinical use. We propose EndoIR, an all-in-one, degradation-agnostic diffusion-based framework that restores multiple degradation types using a single model. EndoIR introduces a Dual-Domain Prompter that extracts joint spatial–frequency features, coupled with an adaptive embedding that encodes both shared and task-specific cues as conditioning for denoising. To mitigate feature confusion in conventional concatenation-based conditioning, we design a Dual-Stream Diffusion architecture that processes clean and degraded inputs separately, with a Rectified Fusion Block integrating them in a structured, degradation-aware manner. Furthermore, Noise-Aware Routing Block improves efficiency by dynamically selecting only noise-relevant features during denoising. Experiments on SegSTRONG-C and CEC datasets demonstrate that EndoIR achieves state-of-the-art performance across multiple degradation scenarios while using fewer parameters than strong baselines, and downstream segmentation experiments confirm its clinical utility.

PDF Details DOI

JBHI Journal 2026 Journal Article

OnUVS: An Online Motion Transfer Framework with Content-Texture Decoupling for High-Fidelity Ultrasound Video Synthesis

Han Zhou
Rusi Chen
Xin Yang
Ao Chang
Junxuan Yu
Yuhao Huang
Ruobing Huang
Xinrui Zhou

Ultrasound (US) imaging plays a crucial role in diagnosing heart and pelvic diseases, where sonographers tend to evaluate dynamic motion and structure. However, the scarcity of US videos for rare cases limits training opportunities for novice sonographers and deep learning models, hindering detection rates and clinical di agnostic applications. US video synthesis is a promising solution to this issue. Nevertheless, accurately imitating the intricate motion of the anatomy while preserving image f idelity presents a significant challenge. In this work, we propose OnUVS, a novel online feature-decoupling frame work for high-fidelity US video synthesis. First, to simulate realistic motion, we incorporate keypoints into anatomical learning through a weakly supervised training approach, which enhances motion representation and minimizes the need for fully annotated data. Second, we implement a dual decoder generator that effectively balances content and textural features of generated frames, significantly enhancing the image fidelity of US videos. Third, a multi-scale discriminator further refines the sharpness and fine details, ensuring high-fidelity video synthesis. Fourth, an online learning strategy is designed to smooth coherence between frames by constraining the keypoint trajectories during inference. Validation on echocardiographic and pelvic floor US datasets demonstrates that OnUVS outperforms existing methods, achieving a 22. 08% improvement in motion consistency (FVD) and 25. 04% in image fidelity (FID). To facilitate reproducibility, we publicly release the code of OnUVSat: https://github.com/LucyChen159/OnUVS.

Details DOI

AAAI Conference 2026 Conference Paper

ReFINE: A Reward-Based Framework for Interpretable and Nuanced Evaluation of Radiology Report Generation

Yunyi Liu
Yingshu Li
Zhanyu Wang
Xinyu Liang
Lingqiao Liu
Lei Wang
Luping Zhou

Automated radiology report generation (R2Gen) has advanced significantly, yet evaluation remains challenging due to the complexity of assessing report quality. Traditional metrics often misalign with human judgments, failing to identify specific deficiencies. To address this, we introduce ReFINE, a framework for training an Evaluation Model using a novel margin-based reward enforcement loss. This approach decomposes report quality into fine-grained sub-scores across user-defined criteria, improving interpretability. Leveraging GPT-4, we generate diverse training data with paired accepted and rejected reports to train our model under a reward-based system. The trained ReFINE Score provides both granular sub-scores and an aggregated quality assessment, enabling criterion-specific evaluation. Experimental results demonstrate ReFINE's superior alignment with human judgments, outperforming traditional metrics in model selection. Its robustness is validated across three expert-annotated datasets—including chest X-rays and multimodal reports covering 9 imaging modalities—and under two distinct scoring systems.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Wenting Xu
Viorela Ila
Luping Zhou
Craig T. Jin

The concept of function and affordance is a critical aspect of 3D scene understanding and supports task-oriented objectives. In this work, we develop a model that learns to structure and vary functional affordance across a 3D hierarchical scene graph representing the spatial organization of a scene. The varying functional affordance is designed to integrate with the varying spatial context of the graph. More specifically, we develop an algorithm that learns to construct a 3D hierarchical scene graph (3DHSG) that captures the spatial organization of the scene. Starting from segmented object point clouds and object semantic labels, we develop a 3DHSG with a top node that identifies the room label, child nodes that define local spatial regions inside the room with region-specific affordances, and grand-child nodes indicating object locations and object-specific affordances. To support this work, we create a custom 3DHSG dataset that provides ground truth data for local spatial regions with region-specific affordances and also object-specific affordances for each object. We employ a Transformer Based Hierarchical Scene Understanding (TB-HSU) model to learn the 3DHSG. We use a multi-task learning framework that learns both room classification and learns to define spatial regions within the room with region-specific affordances. Our work improves on the performance of state-of-the-art baseline models and shows one approach for applying transformer models to 3D scene understanding and the generation of 3DHSGs that capture the spatial organization of a room. The code and dataset are publicly available.

PDF Details DOI

TMLR Journal 2025 Journal Article

TempFlex: Advancing MLLMs with Temporal Perception and Natively Scalable Resolution Encoding

Zhanyu Wang
Chen Tang
Haoyu He
Kuan Feng
Chao Wang
Bingni Zhang
Xiaolei Xu
SHEN WANG

Multimodal large language models (MLLMs) have made significant progress across vision-language tasks, yet many designs still suffer from two core limitations. (i) Excessive visual tokens and broken global context: Tiled Patch Encoding fragments high-resolution images, leading to token overload and disrupting global attention modeling. (ii) Lack of temporal reasoning: Most models process video as independent frames using static image encoders, failing to capture temporal dynamics. We present TempFlex-VL, a token-efficient and temporally aware MLLM that addresses both issues through lightweight architectural enhancements. First, we introduce a resolution-agnostic visual encoder that directly processes full images without tiling, preserving global context while substantially reducing visual tokens. Second, we propose Temporal Fiber Fusion (TFF), a plug-and-play module with three complementary pathways: (1) a dynamic local-convolution branch for fine-grained motion, (2) a gated memory accumulator for long-term dependencies, and (3) a periodic encoder for modeling cyclic patterns. These signals are softly fused, enabling the model to adapt to diverse temporal structures without overfitting. To support large-scale video-language pretraining, we curate TempFlex-2M, a high-quality synthetic video–text corpus generated in a single stage via GPT-4o with direct visual prompting. We instantiate TempFlex-VL using two different language backbones, Gemma3-4B and Qwen3-4B, demonstrating the generality of our design across architectures. Both variants achieve state-of-the-art or competitive results on a wide range of image and video benchmarks while markedly improving token efficiency. Code is publicly available at: https://github.com/wang-zhanyu/TempFlex.

PDF Details

NeurIPS Conference 2025 Conference Paper

Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

Xiaoyu Yue
Zidong Wang
Yuqing Wang
Wenlong Zhang
Xihui Liu
Wanli Ouyang
Lei Bai
Luping Zhou

Recent studies have demonstrated the importance of high-quality visual representations in image generation and have highlighted the limitations of generative models in image understanding. As a generative paradigm originally designed for natural language, autoregressive models face similar challenges. In this work, we present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain. We identify three key properties that hinder the learning of high-level visual semantics: local and conditional dependence, inter-step semantic inconsistency, and spatial invariance deficiency. We show that these issues can be effectively addressed by introducing self-supervised objectives during training, leading to a novel training framework, Self-guided Training for AutoRegressive models (ST-AR). Without relying on pre-trained representation models, ST-AR significantly enhances the image understanding ability of autoregressive models and leads to improved generation quality. Specifically, ST-AR brings approximately 42% FID improvement for LlamaGen-L and 49% FID improvement for LlamaGen-XL, while maintaining the same sampling strategy.

PDF Details

AAAI Conference 2024 Conference Paper

Noise-Aware Image Captioning with Progressively Exploring Mismatched Words

Zhongtian Fu
Kefei Song
Luping Zhou
Yang Yang

Image captioning aims to automatically generate captions for images by learning a cross-modal generator from vision to language. The large amount of image-text pairs required for training is usually sourced from the internet due to the manual cost, which brings the noise with mismatched relevance that affects the learning process. Unlike traditional noisy label learning, the key challenge in processing noisy image-text pairs is to finely identify the mismatched words to make the most use of trustworthy information in the text, rather than coarsely weighing the entire examples. To tackle this challenge, we propose a Noise-aware Image Captioning method (NIC) to adaptively mitigate the erroneous guidance from noise by progressively exploring mismatched words. Specifically, NIC first identifies mismatched words by quantifying word-label reliability from two aspects: 1) inter-modal representativeness, which measures the significance of the current word by assessing cross-modal correlation via prediction certainty; 2) intra-modal informativeness, which amplifies the effect of current prediction by combining the quality of subsequent word generation. During optimization, NIC constructs the pseudo-word-labels considering the reliability of the origin word-labels and model convergence to periodically coordinate mismatched words. As a result, NIC can effectively exploit both clean and noisy image-text pairs to learn a more robust mapping function. Extensive experiments conducted on the MS-COCO and Conceptual Caption datasets validate the effectiveness of our method in various noisy scenarios.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Roll with the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

Yue Duan
Zhen Zhao
Lei Qi
Luping Zhou
Lei Wang
Yinghuan Shi

While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e.g., fine-grained visual classification in the context of SSL (SS-FGVC). The increased recognition difficulty on fine-grained unlabeled data spells disaster for pseudo-labeling accuracy, resulting in poor performance of the SSL model. To tackle this challenge, we propose Soft Label Selection with Confidence-Aware Clustering based on Class Transition Tracking (SoC) by reconstructing the pseudo-label selection process by jointly optimizing Expansion Objective and Shrinkage Objective, which is based on a soft label manner. Respectively, the former objective encourages soft labels to absorb more candidate classes to ensure the attendance of ground-truth class, while the latter encourages soft labels to reject more noisy classes, which is theoretically proved to be equivalent to entropy minimization. In comparisons with various state-of-the-art methods, our approach demonstrates its superior performance in SS-FGVC. Checkpoints and source code are available at https://github.com/NJUyued/SoC4SS-FGVC.

PDF Details DOI

AAAI Conference 2024 Conference Paper

UFDA: Universal Federated Domain Adaptation with Practical Assumptions

Xinhui Liu
Zhenghao Chen
Luping Zhou
Dong Xu
Wei Xi
Gairui Bai
Yihan Zhao
Jizhong Zhao

Conventional Federated Domain Adaptation (FDA) approaches usually demand an abundance of assumptions, which makes them significantly less feasible for real-world situations and introduces security hazards. This paper relaxes the assumptions from previous FDAs and studies a more practical scenario named Universal Federated Domain Adaptation (UFDA). It only requires the black-box model and the label set information of each source domain, while the label sets of different source domains could be inconsistent, and the target-domain label set is totally blind. Towards a more effective solution for our newly proposed UFDA scenario, we propose a corresponding methodology called Hot-Learning with Contrastive Label Disambiguation (HCLD). It particularly tackles UFDA's domain shifts and category gaps problems by using one-hot outputs from the black-box models of various source domains. Moreover, to better distinguish the shared and unknown classes, we further present a cluster-level strategy named Mutual-Voting Decision (MVD) to extract robust consensus knowledge across peer classes from both source and target domains. Extensive experiments on three benchmark datasets demonstrate that our method achieves comparable performance for our UFDA scenario with much fewer assumptions, compared to previous methodologies with comprehensive additional assumptions.

PDF Details DOI

ECAI Conference 2023 Conference Paper

Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives

Bhanu Prakash Voutharoja
Lei Wang 0001
Luping Zhou

Automatic radiology report generation is challenging as medical images or reports are usually similar to each other due to the common content of anatomy. This makes a model hard to capture the uniqueness of individual images and is prone to producing undesired generic or mismatched reports. This situation calls for learning more discriminative features that could capture even fine-grained mismatches between images and reports. To achieve this, this paper proposes a novel framework to learn discriminative image and report features by distinguishing them from their closest peers, i. e. , hard negatives. Especially, to attain more discriminative features, we gradually raise the difficulty of such a learning task by creating increasingly hard negative reports for each image in the feature space during training, respectively. By treating the increasingly hard negatives as auxiliary variables, we formulate this process as a min-max alternating optimisation problem. At each iteration, conditioned on a given set of hard negative reports, image and report features are learned as usual by minimising the loss functions related to report generation. After that, a new set of harder negative reports will be created by maximising a loss reflecting image-report alignment. By solving this optimisation, we attain a model that can generate more specific and accurate reports. It is noteworthy that our framework enhances discriminative feature learning without introducing extra network weights. Also, in contrast to the existing way of generating hard negatives, our framework extends beyond the granularity of the dataset by generating harder samples out of the training set. Experimental study on benchmark datasets verifies the efficacy of our framework and shows that it can serve as a plug-in to readily improve existing medical report generation models. The code is publicly available at https: //github. com/Bhanu068/ITHN.

Details

NeurIPS Conference 2022 Conference Paper

Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class

Guan Gui
Zhen Zhao
Lei Qi
Luping Zhou
Lei Wang
Yinghuan Shi

In semi-supervised learning (SSL), a common practice is to learn consistent information from unlabeled data and discriminative information from labeled data to ensure both the immutability and the separability of the classification model. Existing SSL methods suffer from failures in barely-supervised learning (BSL), where only one or two labels per class are available, as the insufficient labels cause the discriminative information being difficult or even infeasible to learn. To bridge this gap, we investigate a simple yet effective way to leverage unlabeled samples for discriminative learning, and propose a novel discriminative information learning module to benefit model training. Specifically, we formulate the learning objective of discriminative information at the super-class level and dynamically assign different classes into different super-classes based on model performance improvement. On top of this on-the-fly process, we further propose a distribution-based loss to learn discriminative information by utilizing the similarity relationship between samples and super-classes. It encourages the unlabeled samples to stay closer to the distribution of their corresponding super-class than those of others. Such a constraint is softer than the direct assignment of pseudo labels, while the latter could be very noisy in BSL. We compare our method with state-of-the-art SSL and BSL methods through extensive experiments on standard SSL benchmarks. Our method can achieve superior results, \eg, an average accuracy of 76. 76\% on CIFAR-10 with merely 1 label per class.

PDF Details

AAAI Conference 2022 Conference Paper

LaSSL: Label-Guided Self-Training for Semi-supervised Learning

Zhen Zhao
Luping Zhou
Lei Wang
Yinghuan Shi
Yang Gao

The key to semi-supervised learning (SSL) is to explore adequate information to leverage the unlabeled data. Current dominant approaches aim to generate pseudolabels on weakly augmented instances and train models on their corresponding strongly augmented variants with high-confidence results. However, such methods are limited in excluding samples with low-confidence pseudo-labels and under-utilization of the label information. In this paper, we emphasize the cruciality of the label information and propose a Label-guided Self-training approach to Semi-supervised Learning (LaSSL), which improves pseudo-label generations from two mutually boosted strategies. First, with the ground-truth labels and iteratively-polished pseudolabels, we explore instance relations among all samples and then minimize a class-aware contrastive loss to learn discriminative feature representations that make same-class samples gathered and different-class samples scattered. Second, on top of improved feature representations, we propagate the label information to the unlabeled samples across the potential data manifold at the feature-embedding level, which can further improve the labelling of samples with reference to their neighbours. These two strategies are seamlessly integrated and mutually promoted across the whole training process. We evaluate LaSSL on several classification benchmarks under partially labeled settings and demonstrate its superiority over the state-of-the-art approaches.

PDF Details

AIIM Journal 2021 Journal Article

Interactive medical image segmentation via a point-based interaction

Jian Zhang
Yinghuan Shi
Jinquan Sun
Lei Wang
Luping Zhou
Yang Gao
Dinggang Shen

Details DOI

JBHI Journal 2020 Journal Article

Coherent Pattern in Multi-Layer Brain Networks: Application to Epilepsy Identification

Jiashuang Huang
Qi Zhu
Mingliang Wang
Luping Zhou
Zhiqiang Zhang
Daoqiang Zhang

Currently, how to conjointly fuse structural connectivity (SC) and functional connectivity (FC) for identifying brain diseases is a hot topic in the area of brain network analysis. Most of the existing works combine two types of connectivity in decision level, thus ignoring the underlying relationship between SC and FC. To solve this problem, in this paper, we model the brain network as the multi-layer network formed by the SC and FC, and then propose a coherent pattern to represent structural information of the multi-layer network for the brain disease identification. The proposed coherent pattern consists of a paired-subgraph extracted from the FC and SC within the same node-set. Compared with the previous methods, this coherent pattern not only describes the connectivity information of both SC and FC by subgraphs at each layer, but also reflects their intrinsic relationship by the co-occurrence pattern of the paired-subgraph. Based on this coherent pattern, we further develop a framework for identifying brain diseases. Specifically, we first construct multi-layer networks by using SC and FC for each subject and then mine coherent patterns that frequently appear in each group. Next, we select the discriminative coherent pattern from these frequent coherent patterns according to their frequency of occurrence. Finally, we construct a feature matrix for each subject based on the binary indicator vector and then use the support vector machine (SVM) as its classifier. Experimental results on real epilepsy datasets demonstrate that our method outperforms several state-of-the-art approaches in the tasks of brain disease classification.

Details DOI

JBHI Journal 2020 Journal Article

Epileptic Seizure Classification With Symmetric and Hybrid Bilinear Models

Tennison Liu
Nhan Duy Truong
Armin Nikpour
Luping Zhou
Omid Kavehei

Epilepsy affects nearly 1% of the global population, of which two thirds can be treated by anti-epileptic drugs and a much lower percentage by surgery. Diagnostic procedures for epilepsy and monitoring are highly specialized and labour-intensive. The accuracy of the diagnosis is also complicated by overlapping medical symptoms, varying levels of experience and inter-observer variability among clinical professions. This paper proposes a novel hybrid bilinear deep learning network with an application in the clinical procedures of epilepsy classification diagnosis, where the use of surface electroencephalogram (sEEG) and audiovisual monitoring is standard practice. Hybrid bilinear models based on two types of feature extractors, namely Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), are trained using Short-Time Fourier Transform (STFT) of one-second sEEG. In the proposed hybrid models, CNNs extract spatio-temporal patterns, while RNNs focus on the characteristics of temporal dynamics in relatively longer intervals given the same input data. Second-order features, based on interactions between these spatio-temporal features are further explored by bilinear pooling and used for epilepsy classification. Our proposed methods obtain an F1-score of 97. 4% on the Temple University Hospital Seizure Corpus and 97. 2% on the EPILEPSIAE dataset, comparing favourably to existing benchmarks for sEEG-based seizure type classification. The open-source implementation of this study is available at https://github.com/NeuroSyd/Epileptic-SeizureClassification.

Details DOI

NeurIPS Conference 2020 Conference Paper

Improving Auto-Augment via Augmentation-Wise Weight Sharing

Keyu Tian
Chen Lin
Ming Sun
Luping Zhou
Junjie Yan
Wanli Ouyang

The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would be time-consuming. To achieve efficiency, many choose to sacrifice evaluation reliability for speed. In this paper, we dive into the dynamics of augmented training of the model. This inspires us to design a powerful and efficient proxy task based on the Augmentation-Wise Weight Sharing (AWS) to form a fast yet accurate evaluation process in an elegant way. Comprehensive analysis verifies the superiority of this approach in terms of effectiveness and efficiency. The augmentation policies found by our method achieve superior accuracies compared with existing auto-augmentation search methods. On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data. On ImageNet, we get a top-1 error rate of 20. 36% for ResNet-50, which leads to 3. 34% absolute error rate reduction over the baseline augmentation.

PDF Details

YNIMG Journal 2018 Journal Article

3D conditional generative adversarial networks for high-quality PET image estimation at low dose

Yan Wang
Biting Yu
Lei Wang
Chen Zu
David S. Lalush
Weili Lin
Xi Wu
Jiliu Zhou

Details DOI

JBHI Journal 2017 Journal Article

HEp-2 Cell Image Classification With Deep Convolutional Neural Networks

Zhimin Gao
Lei Wang
Luping Zhou
Jianjia Zhang

Efficient Human Epithelial-2 cell image classification can facilitate the diagnosis of many autoimmune diseases. This paper proposes an automatic framework for this classification task, by utilizing the deep convolutional neural networks (CNNs) which have recently attracted intensive attention in visual recognition. In addition to describing the proposed classification framework, this paper elaborates several interesting observations and findings obtained by our investigation. They include the important factors that impact network design and training, the role of rotation-based data augmentation for cell images, the effectiveness of cell image masks for classification, and the adaptability of the CNN-based classification system across different datasets. Extensive experimental study is conducted to verify the above findings and compares the proposed framework with the well-established image classification models in the literature. The results on benchmark datasets demonstrate that 1) the proposed framework can effectively outperform existing models by properly applying data augmentation, 2) our CNN-based framework has excellent adaptability across different datasets, which is highly desirable for cell image classification under varying laboratory settings. Our system is ranked high in the cell image classification competition hosted by ICPR 2014.

Details DOI

JBHI Journal 2014 Journal Article

Multiple Kernel Learning in the Primal for Multimodal Alzheimer’s Disease Classification

Fayao Liu
Luping Zhou
Chunhua Shen
Jianping Yin

To achieve effective and efficient detection of Alzheimer's disease (AD), many machine learning methods have been introduced into this realm. However, the general case of limited training samples, as well as different feature representations typically makes this problem challenging. In this paper, we propose a novel multiple kernel-learning framework to combine multimodal features for AD classification, which is scalable and easy to implement. Contrary to the usual way of solving the problem in the dual, we look at the optimization from a new perspective. By conducting Fourier transform on the Gaussian kernel, we explicitly compute the mapping function, which leads to a more straightforward solution of the problem in the primal. Furthermore, we impose the mixed L 21 norm constraint on the kernel weights, known as the group lasso regularization, to enforce group sparsity among different feature modalities. This actually acts as a role of feature modality selection, while at the same time exploiting complementary information among different kernels. Therefore, it is able to extract the most discriminative features for classification. Experiments on the ADNI dataset demonstrate the effectiveness of the proposed method.

Details DOI

YNIMG Journal 2011 Journal Article

Multimodal classification of Alzheimer's disease and mild cognitive impairment

Daoqiang Zhang
Yaping Wang
Luping Zhou
Hong Yuan
Dinggang Shen

Details DOI