Author name cluster

Junwei Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers

1 author row

AAAI Conference 2026 Conference Paper

AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation

Ziyang Luo
Nian Liu
Fahad Shahbaz Khan
Junwei Han

Reference Audio-Visual Segmentation (Ref-AVS) tasks challenge models to precisely locate sounding objects by integrating visual, auditory, and textual cues. Existing methods often lack genuine semantic understanding, tending to memorize fixed reasoning patterns. Furthermore, jointly training for reasoning and segmentation can compromise pixel-level precision. To address these issues, we introduce AURORA, a novel framework designed to enhance genuine reasoning and language comprehension in reference audio-visual segmentation. We employ a structured Chain-of-Thought (CoT) prompting mechanism to guide the model through a step-by-step reasoning process and introduce a novel segmentation feature distillation loss to effectively integrate these reasoning abilities without sacrificing segmentation performance. To further cultivate the model's genuine reasoning capabilities, we devise a further two-stage training strategy: first, a ``corrective reflective-style training" stage utilizes self-correction to enhance the quality of reasoning paths, followed by reinforcement learning via Group Reward Policy Optimization (GRPO) to bolster robustness in challenging scenarios. Experiments demonstrate that AURORA achieves state-of-the-art performance on Ref-AVS benchmarks and generalizes effectively to unreferenced segmentation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UQ-ViT: Harmonizing Extreme Activations with Hardware-Friendly Uniform Quantization in Vision Transformers

Tao Jiang
Yucheng Jiang
Xiwen Yao
Gong Cheng
Junwei Han

Post-Training Quantization enables efficient Vision Transformer (ViTs) deployment with a small calibration data, and its prevalent use of uniform quantization harnesses AI accelerator matrix cores for high-speed inference. However, the application of uniform quantization is fundamentally challenged by the extreme non-uniformity of activation distributions.Specifically, the power-law nature of post-Softmax attention scores and the significant inter-channel variance in post-GELU activations create a dilemma for conventional quantization, as it struggles to preserve critical high-magnitude values without sacrificing overall precision. To resolve this core conflict, we introduce UQ-ViT (Uniform Quantization for Vision Transformers), a novel uniform quantization framework designed to reconcile high precision with hardware efficiency. Central to UQ-ViT are two operators: Dynamic Elimination of Maximum (DeMax) and Normalization Quantization (NormQuant). DeMax is a quantization operator for post-Softmax attention scores that utilizes uniform quantization. It dynamically eliminates and preserves dominant values, effectively mitigating quantization loss from the extreme values in the power-law distribution. NormQuant utilizes a per-channel quantization strategy during quantization and reverts to a per-tensor format for dequantization, achieving both high accuracy and computational efficiency. Crucially, it is applicable to any linear layer, enabling effective quantization of post-GELU activations in ViTs. Through extensive experiments on various ViTs and vision tasks, including image classification, object detection, and instance segmentation, we demonstrate that our proposed approach outperforms existing methods, achieving superior accuracy while ensuring hardware friendliness.

PDF Details DOI

JBHI Journal 2025 Journal Article

A Foundational fMRI Model for Representing Continuous Brain States

Li Yang
Lei Guo
Yixuan Yuan
Junwei Han
Xintao Hu
Tuo Zhang

Foundational models have significant potential to advance brain function research, particularly in understanding the dynamics of brain states. However, most existing models process brain signals within fixed time windows, restricting their ability to capture the full temporal complexity of brain activity. In this study, we propose BrainSN (Brain States Network), a novel fMRI foundational model designed to represent continuous brain state information and support diverse downstream tasks. First, leveraging a transformer-based architecture, BrainSN reconstructs input brain states across multiple time scales and predicts future brain activity, effectively capturing both short-term and long-term dependencies. Second, through multiple embeddings and a channel gating module, the model integrates brain state information and applies an attention mechanism to extract critical features. Additionally, we train BrainSN on 1, 256 hours of resting-state and naturalistic stimulus fMRI data, enabling it to learn large-scale brain dynamics without relying on task-based paradigms. Without fine-tuning, BrainSN achieves 75. 23% and 75. 82% accuracy in autism and attention disorder diagnosis tasks, respectively, matching the performance of leading models pretrained on disease-specific data. After fine-tuning, it surpasses these models. In mental state decoding, BrainSN attains 95. 31% accuracy without fine-tuning, outperforming the best models trained on large-scale task-based fMRI data. Furthermore, by analyzing BrainSN's embeddings in relation to movie stimuli, we demonstrate that the model effectively captures the semantic content of movie scenes embedded in fMRI signals and is highly sensitive to sequence. These results highlight BrainSN's ability to model brain state dynamics and underscore its potential advantages for clinical diagnosis, treatment evaluation, and cognitive neuroscience research.

Details DOI

JBHI Journal 2025 Journal Article

Frequency-Aware B-Line and Pleural Line Analysis in Lung Ultrasound Videos

Kaihui Yang
Guangyu Guo
Ying Zhang
Linxuan Pang
Zhaohui Zheng
Ruyu Liu
Jin Ding
Dingwen Zhang

Accurately identifying B-lines and pleural line (P-line) in lung ultrasound (LUS) videos is valuable for evaluating certain lung conditions. However, manual interpretation remains subjective and highly dependent on operator expertise. Existing deep learning methods often suffer from performance degradation due to speckle noise and motion artifacts. Moreover, the limited availability of LUS video data annotated for multiple diagnostic features such as B-lines and the P-line limits model development. Therefore, this paper introduces ILD-LUS, a new clinical LUS database designed based on interstitial lung disease (ILD) analysis by category labeling, comprising 2, 149 ultrasound videos (193, 410 frames). Also, we construct an external test set based on the public Covid-BLUES dataset for the evaluation of B-lines and P-line recognition in different pulmonary pathologies. Then, we propose a novel video analysis framework that integrates wavelet enhancement with temporal attention modeling. Specifically, we employ a dual-component frequency feature enhancement method using the Discrete Wavelet Transform (DWT), which effectively suppresses noise while preserving important landmarks. Subsequently, an adaptive attention module is introduced to model long-range temporal dependencies and improve dynamic feature representation across consecutive frames. Experimental results show that the proposed method achieves over 94% AUC and 82% ACC for both B-lines and P-line classification on both the ILD-LUS and Covid-BLUES datasets, outperforming existing methods. These findings demonstrate the robustness and generalizability of our approach across different pathological conditions. Overall, the proposed framework shows strong potential for supporting clinical decision-making in LUS analysis. The code is available at https://github.com/KaIi-github/WaveLUS.

Details DOI

JBHI Journal 2025 Journal Article

MHKD: Multi-Step Hybrid Knowledge Distillation for Low-Resolution Whole Slide Images Glomerulus Detection

Xiangsen Zhang
Longfei Han
Chenchu Xu
Zhaohui Zheng
Jin Ding
Xianghui Fu
Dingwen Zhang
Junwei Han

Glomerulus detection is a critical component of renal histopathology assessment, essential for diagnosing glomerulonephritis. To mitigate the increasing workload on pathologists, AI-assisted diagnostic methods based on high-resolution digital pathology whole slide images have been developed. However, these current AI-assisted approaches are limited to high-resolution whole slide images, necessitating expensive digital scanner equipment, high image storage costs, and significant computational complexity. To address this limitation, this paper pioneers a method for facilitating glomerulus detection in low-resolution human kidney pathology images. Specifically, we propose a novel multi-step hybrid knowledge distillation method. Our method distills both the global features and the semantic information through a hybrid knowledge distillation strategy that integrates offline and online knowledge distillation, where the information from high-resolution pathological images is successively transferred to student model from the global features in the shallow network layers to the semantic information of the back-end through a multi-step training strategy. Experimental results on two datasets show that the proposed method achieves effective detection outcomes for low-resolution kidney pathology images. Compared to other state-of-the-art detection techniques, our method achieves an ${AP}_{0. 5: 0. 95}$ improvement of 23. 1% on the private LN dataset and 15. 9% on the public HUBMAP dataset.

Details DOI

NeurIPS Conference 2025 Conference Paper

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

Diqi He
Xuehao Gao
Hao Li
Junwei Han
Dingwen Zhang

The Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) task requires agents to navigate previously unseen 3D environments using natural language instructions, without any scene-specific training. A critical challenge in this setting lies in ensuring agents’ actions align with both spatial structure and task intent over long-horizon execution. Existing methods often fail to achieve robust navigation due to a lack of structured decision-making and insufficient integration of feedback from previous actions. To address these challenges, we propose STRIDER (Instruction-Aligned Structural Decision Space Optimization), a novel framework that systematically optimizes the agent’s decision space by integrating spatial layout priors and dynamic task feedback. Our approach introduces two key innovations: 1) a Structured Waypoint Generator that constrains the action space through spatial structure, and 2) a Task-Alignment Regulator that adjusts behavior based on task progress, ensuring semantic alignment throughout navigation. Extensive experiments on the R2R-CE and RxR-CE benchmarks demonstrate that STRIDER significantly outperforms strong SOTA across key metrics; in particular, it improves Success Rate (SR) from 29\% to 35\%, a relative gain of 20. 7\%. Such results highlight the importance of spatially constrained decision-making and feedback-guided execution in improving navigation fidelity for zero-shot VLN-CE.