Arrow Research search

Author name cluster

Runze Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment

  • Bohan Fu
  • Guanyi Qin
  • Fazhan Zhang
  • Zihao Huang
  • Mingxuan Li
  • Runze Hu

Blind Image Quality Assessment, aiming to replicate human perception of visual quality without reference, plays a key role in vision tasks, yet existing models often fail to effectively capture subtle distortion cues, leading to a misalignment with human subjective judgments. We identify that the root cause of this limitation lies in the lack of reliable distortion priors, as methods typically learn shallow relationships between unified image features and quality scores, resulting in their insensitive nature to distortions and thus limiting their performance. To address this, we introduce DR.Experts, a novel prior-driven BIQA framework designed to explicitly incorporate distortion priors, enabling a reliable quality assessment. DR.Experts begins by leveraging a degradation-aware vision-language model to obtain distortion-specific priors, which are further refined and enhanced by the proposed Distortion-Saliency Differential Module through distinguishing them from semantic attentions, thereby ensuring the genuine representations of distortions. The refined priors, along with semantics and bridging representation, are then fused by a proposed mixture-of-experts style module named the Dynamic Distortion Weighting Module. This mechanism weights each distortion-specific feature as per its perceptual impact, ensuring that the final quality prediction aligns with human perception. Extensive experiments conducted on five challenging BIQA benchmarks demonstrate the superiority of DR.Experts over current methods and showcase its excellence in terms of generalization and data efficiency.

AAAI Conference 2026 Conference Paper

DSP-PCQA: Integrating Multiple Perception Preferences for Point Cloud Quality Assessment

  • Mingxuan Li
  • Fazhan Zhang
  • Zhenzhe Hou
  • Zihao Huang
  • Bohan Fu
  • Runze Hu
  • Xiaohui Chu

Point Cloud Quality Assessment (PCQA) faces a critical disconnect: existing methods operate on a flawed single-perception paradigm, while human observers evaluate quality through dual cognitive streams: technical rationality and semantic sensibility. This fundamental mismatch routinely produces assessment failures in real-world scenarios where technical and semantic signals conflict. To address this, we introduce Dual-Stream Perception PCQA (DSP-PCQA), the first framework that explicitly models this perceptual duality through parallel networks thoroughly mirroring the human cognitive pathway. DSP-PCQA introduces three key innovations: (1) a Decoupled Focus Enhancer (DFE) that surgically isolates technical and semantic information using two targeted transformations; (2) a Context & Attribute Correlation Awareness (CACA) module that captures the dynamic, non-linear relationships between different views and sub-models characteristic of human visual processing; and (3) an Exchange-based Perceptual Injection (EPI) module that strategically transfers information between perception streams, simulating how humans integrate multiple perceptual dimensions. Extensive evaluations show DSP-PCQA outperforms state-of-the-art methods across multiple benchmarks. Most importantly, our method resolves the perceptual discord that plagues existing approaches, maintaining high accuracy even in the challenging boundary cases where technical quality and semantic significance diverge, precisely where conventional methods often struggle.

AAAI Conference 2026 Conference Paper

FVNet: Harnessing Liquid Neural Dynamics for Lightweight Visual Representation

  • Zhenzhe Hou
  • Xiaohui Chu
  • Runze Hu
  • Yang Li
  • Yutao Liu

Efficient visual backbone design remains crucial for resource-constrained computer vision applications. Inspired by the adaptive continuous-time dynamics observed in biological neurons, we propose FVNet, a novel lightweight architecture that integrates liquid neural dynamics for efficient and dynamic visual feature extraction. Central to FVNet is the Fluid Temporal Flow Unit (FTFU), which employs continuous-time equations with learnable time constants to capture spatio-temporal dependencies adaptively. By further stacking these units in a Multi-Phase Fluid Block (MPFB), our model processes features across parallel temporal scales, enabling context-aware feature encoding without incurring excessive computational overhead. Through a discrete closed-form solution, FVNet achieves the representational power of continuous-time models while avoiding the instability and overhead of iterative numerical solvers. Extensive experiments on various vision tasks demonstrate that FVNet achieves superior performance and efficiency over existing state-of-the-art lightweight networks.

AAAI Conference 2026 Conference Paper

Points Meet Pixels: Bridging 2D Vision-Language Model and 3D Perception Gaps for Point Cloud Quality Assessment

  • Mingxuan Li
  • Zihao Huang
  • Xiaohui Chu
  • Fazhan Zhang
  • Bohan Fu
  • Runze Hu

Vision-Language Models (VLMs) have demonstrated significant progress in quality assessment tasks. However, a fundamental paradox arises when their application to Point Cloud Quality Assessment (PCQA). Existing VLMs, designed for image-text pairs, are inherently incompatible with 3D point cloud data due to the modality gap. While some PCQA research attempts to adapt point clouds to VLMs by 2D projection, this approach inevitably sacrifices crucial spatial structure information essential for accurate quality assessment. Conversely, directly integrating a dedicated 3D branch into a VLM-based PCQA framework introduces feature space misalignment and an influx of quality-insensitive information. To bridge these fundamental conflicts hindering VLMs' adaptation to PCQA, we propose the PMP-PCQA framework, which leverages the inherent mapping relationship between points and pixels to seamlessly apply VLMs to PCQA. Our approach introduces three key innovations: a Spatial Awareness Enhancer(SAE) module that enriches the image features with spatial coordinate clues to reinforce geometric awareness in 2D visual representations; a Fine-to-coarse Consistency Alignment(FCA) module that bridges the gap between 2D and 3D modalities by leveraging point-pixel correspondences to construct bridging features; and a Text-Guided Adaptive Miner(TAM) module that dynamically suppresses quality-insensitive features to mine discriminative visual clues for PCQA. Extensive evaluations demonstrate that PMP-PCQA consistently outperforms state-of-the-art methods across multiple benchmarks.

AAAI Conference 2025 Conference Paper

BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution

  • Zihao He
  • Shengchuan Zhang
  • Runze Hu
  • Yunhang Shen
  • Yan Zhang

Super-resolution (SR) techniques are critical for enhancing image quality, particularly in scenarios where high-resolution imagery is essential yet limited by hardware constraints. Existing diffusion models for SR have relied predominantly on Gaussian models for noise generation, which often fall short when dealing with the complex and variable texture inherent in natural scenes. To address these deficiencies, we introduce the Bayesian Uncertainty Guided Diffusion Probabilistic Model (BUFF). BUFF distinguishes itself by incorporating a Bayesian network to generate high-resolution uncertainty masks. These masks guide the diffusion process, allowing for the adjustment of noise intensity in a manner that is both context-aware and adaptive. This novel approach not only enhances the fidelity of super-resolved images to their original high-resolution counterparts but also significantly mitigates artifacts and blurring in areas characterized by complex textures and fine details. The model demonstrates exceptional robustness against complex noise patterns and showcases superior adaptability in handling textures and edges within images. Empirical evidence, supported by visual results, illustrates the model's robustness, especially in challenging scenarios, and its effectiveness in addressing common SR issues such as blurring. Experimental evaluations conducted on the DIV2K dataset reveal that BUFF achieves a notable improvement, with a +0.61 increase compared to baseline in SSIM on BSD100, surpassing traditional diffusion approaches by an average additional +0.20dB PSNR gain. These findings underscore the potential of Bayesian methods in enhancing diffusion processes for SR, paving the way for future advancements in the field.

AAAI Conference 2025 Conference Paper

Feature Denoising Diffusion Model for Blind Image Quality Assessment

  • Xudong Li
  • Yan Zhang
  • Yunhang Shen
  • Ke Li
  • Runze Hu
  • Xiawu Zheng
  • Sicheng Zhao

Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we take an initial step toward exploring the diffusion model for feature denoising in BIQA, namely Perceptual Feature Diffusion for IQA (PFD-IQA), which aims to remove noise from quality-aware features. Specifically, 1) we propose a Perceptual Prior Discovery and Aggregation module to establish two auxiliary tasks to discover potential low-level features in images that are used to aggregate perceptual textual prompt conditions for the diffusion model. 2) we propose a Perceptual Conditional Feature Refinement strategy, which matches noisy features to predefined denoising trajectories and then performs exact feature denoising based on textual prompt conditions. By incorporating a lightweight denoiser and requiring only a few feature denoising steps (e.g., just five iterations), our PFD-IQA framework achieves superior performance across eight standard BIQA datasets, validating its effectiveness.

ICML Conference 2024 Conference Paper

Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

  • Xudong Li
  • Timin Gao
  • Runze Hu
  • Yan Zhang 0109
  • Shengchuan Zhang
  • Xiawu Zheng
  • Jingyuan Zheng
  • Yunhang Shen

The current state-of-the-art No-Reference Image Quality Assessment (NR-IQA) methods typically rely on feature extraction from upstream semantic backbone networks, assuming that all extracted features are relevant. However, we make a key observation that not all features are beneficial, and some may even be harmful, necessitating careful selection. Empirically, we find that many image pairs with small feature spatial distances can have vastly different quality scores, indicating that the extracted features may contain quality-irrelevant noise. To address this issue, we propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) that employs an adversarial perspective to remove harmful semantic noise features from the upstream task. Specifically, QFM-IQM enhances the semantic noise distinguish capabilities by matching image pairs with similar quality scores but varying semantic features as adversarial semantic noise and adaptively adjusting the upstream task’s features by reducing sensitivity to adversarial noise perturbation. Furthermore, we utilize a distillation framework to expand the dataset and improve the model’s generalization ability. Extensive experiments conducted on eight standard IQA datasets have demonstrated the effectiveness of our proposed QFM-IQM.

AAAI Conference 2024 Conference Paper

Cross-Modal Match for Language Conditioned 3D Object Grounding

  • Yachao Zhang
  • Runze Hu
  • Ronghui Li
  • Yanyun Qu
  • Yuan Xie
  • Xiu Li

Language conditioned 3D object grounding aims to find the object within the 3D scene mentioned by natural language descriptions, which mainly depends on the matching between visual and natural language. Considerable improvement in grounding performance is achieved by improving the multimodal fusion mechanism or bridging the gap between detection and matching. However, several mismatches are ignored, i.e., mismatch in local visual representation and global sentence representation, and mismatch in visual space and corresponding label word space. In this paper, we propose crossmodal match for 3D grounding from mitigating these mismatches perspective. Specifically, to match local visual features with the global description sentence, we propose BEV (Bird’s-eye-view) based global information embedding module. It projects multiple object proposal features into the BEV and the relations of different objects are accessed by the visual transformer which can model both positions and features with long-range dependencies. To circumvent the mismatch in feature spaces of different modalities, we propose crossmodal consistency learning. It performs cross-modal consistency constraints to convert the visual feature space into the label word feature space resulting in easier matching. Besides, we introduce label distillation loss and global distillation loss to drive these matches learning in a distillation way. We evaluate our method in mainstream evaluation settings on three datasets, and the results demonstrate the effectiveness of the proposed method.

ICML Conference 2024 Conference Paper

Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment

  • Xudong Li
  • Runze Hu
  • Jingyuan Zheng
  • Yan Zhang 0109
  • Shengchuan Zhang
  • Xiawu Zheng
  • Ke Li 0015
  • Yunhang Shen

Blind Image Quality Assessment (BIQA) mirrors subjective made by human observers. Generally, humans favor comparing relative qualities over predicting absolute qualities directly. However, current BIQA models focus on mining the "local" context, i. e. , the relationship between information among individual images and the absolute quality of the image, ignoring the "global" context of the relative quality contrast among different images in the training data. In this paper, we present the Perceptual Context and Sensitivity BIQA (CSIQA), a novel contrastive learning paradigm that seamlessly integrates "global” and "local” perspectives into the BIQA. Specifically, the CSIQA comprises two primary components: 1) A Quality Context Contrastive Learning module, which is equipped with different contrastive learning strategies to effectively capture potential quality correlations in the global context of the dataset. 2) A Quality-aware Mask Attention Module, which employs the random mask to ensure the consistency with visual local sensitivity, thereby improving the model’s perception of local distortions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods.

AAAI Conference 2024 Conference Paper

Semi-Supervised Blind Image Quality Assessment through Knowledge Distillation and Incremental Learning

  • Wensheng Pan
  • Timin Gao
  • Yan Zhang
  • Xiawu Zheng
  • Yunhang Shen
  • Ke Li
  • Runze Hu
  • Yutao Liu

Blind Image Quality Assessment (BIQA) aims to simulate human assessment of image quality. It has a great demand for labeled data, which is often insufficient in practice. Some researchers employ unsupervised methods to address this issue, which is challenging to emulate the human subjective system. To this end, we introduce a unified framework that combines semi-supervised and incremental learning to address the mentioned issue. Specifically, when training data is limited, semi-supervised learning is necessary to infer extensive unlabeled data. To facilitate semi-supervised learning, we use knowledge distillation to assign pseudo-labels to unlabeled data, preserving analytical capability. To gradually improve the quality of pseudo labels, we introduce incremental learning. However, incremental learning can lead to catastrophic forgetting. We employ Experience Replay by selecting representative samples during multiple rounds of semi-supervised learning, to alleviate forgetting and ensure model stability. Experimental results show that the proposed approach achieves state-of-the-art performance across various benchmark datasets. After being trained on the LIVE dataset, our method can be directly transferred to the CSIQ dataset. Compared with other methods, it significantly outperforms unsupervised methods on the CSIQ dataset with a marginal performance drop (-0.002) on the LIVE dataset. In conclusion, our proposed method demonstrates its potential to tackle the challenges in real-world production processes.

AAAI Conference 2023 Conference Paper

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

  • Guanyi Qin
  • Runze Hu
  • Yutao Liu
  • Xiawu Zheng
  • Haotian Liu
  • Xiu Li
  • Yan Zhang

Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents. To confront this challenge, we in this paper propose a novel BIQA pipeline based on the Transformer architecture, which achieves an efficient quality-aware feature representation with much fewer data. More specifically, we consider the traditional fine-tuning in BIQA as an interpretation of the pre-trained model. In this way, we further introduce a Transformer decoder to refine the perceptual information of the CLS token from different perspectives. This enables our model to establish the quality-aware feature manifold efficiently while attaining a strong generalization capability. Meanwhile, inspired by the subjective evaluation behaviors of human, we introduce a novel attention panel mechanism, which improves the model performance and reduces the prediction uncertainty simultaneously. The proposed BIQA method maintains a light-weight design with only one layer of the decoder, yet extensive experiments on eight standard BIQA datasets (both synthetic and authentic) demonstrate its superior performance to the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875 (vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE). Checkpoints, logs and code will be available at https://github.com/narthchin/DEIQT.