Author name cluster

Qiang Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

1 author row

YNICL Journal 2026 Journal Article

Functional gradient reorganization and transcriptomic signatures underlying 1 Hz rTMS treatment in first-episode schizophrenia following right orbitofrontal cortex stimulation

Wenwen Miao
Xiong Jiao
Ningning Zeng
Min Wang
Kexu Zhang
Cheng Yang
Yuanjun Xie
Ziliang Wang

Details DOI

AAAI Conference 2026 Conference Paper

On the Evaluation of Capability Estimation Methods for Large Language Models

Qiang Hu
Jin Wen
Yao Zhang
Maxime Cordy
Yongqiang Lyu

The emergence of large language models (LLMs) marks a transformative era in artificial intelligence~(AI). However, systematically evaluating the capability of LLMs is challenging due to the necessity of a large number of labeled test data. To tackle this problem, in the conventional AI field, AutoEval has been proposed to estimate the capability of AI models without data labeling effort. Unfortunately, even though multiple AutoEval methods have been proposed, most are constructed for classification tasks and evaluated only on image datasets. As a result, their effectiveness for LLMs is unclear, as LLMs often target generation tasks. In this work, we introduce the first AutoEval benchmark specifically designed to estimate the capability of LLMs using unlabeled test data, AEBench. Besides existing AutoEval methods, AEBench also supports our designed method, which utilizes the correlation between data uncertainty and model ability for the capability estimation. In total, AEBench covers 12 AutoEval methods and 120 method combinations. Based on AEBench, we conducted a comprehensive study to explore the usefulness of AutoEval on LLMs. Experimental results on 10 datasets demonstrated that our designed uncertainty features-based methods perform the best in achieving the lowest estimation errors.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy

Qiang Hu
Qimei Wang
Yingjie Guo
Qiang Li
Zhiwei Wang

White-Light Imaging (WLI) is the standard for endoscopic cancer screening, but Narrow-Band Imaging (NBI) offers superior diagnostic details. A key challenge is transferring knowledge from NBI to enhance WLI-only models, yet existing methods are critically hampered by their reliance on paired NBI-WLI images of the same lesion, a costly and often impractical requirement that leaves vast amounts of clinical data untapped. In this paper, we break this paradigm by introducing PaGKD, a novel Pairing-free Group-level Knowledge Distillation framework that that enables effective cross-modal learning using unpaired WLI and NBI data. Instead of forcing alignment between individual, often semantically mismatched image instances, PaGKD operates at the group level to distill more complete and compatible knowledge across modalities. Central to PaGKD are two complementary modules: (1) Group-level Prototype Distillation (GKD-Pro) distills compact group representations by extracting modality-invariant semantic prototypes via shared lesion-aware queries; (2) Group-level Dense Distillation (GKD-Den) performs dense cross-modal alignment by guiding group-aware attention with activation-derived relation maps. Together, these modules enforce global semantic consistency and local structural coherence without requiring image-level correspondence. Extensive experiments on four clinical datasets demonstrate that PaGKD consistently and significantly outperforms state-of-the-art methods, boosting AUC by 3.3%, 1.1%, 2.8%, and 3.2%, respectively, establishing a new direction for cross-modal learning from unpaired data.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming

Zihan Zheng
Zhenlong Wu
Houqiang Zhong
Yuan Tian
Ning Cao
Lan Xu
Jiangchao Yao
Xiaoyun Zhang

Achieving seamless viewing of high-fidelity volumetric video, comparable to 2D video experiences, remains an open challenge. Existing volumetric video compression methods either lack the flexibility to adjust quality and bitrate within a single model for efficient streaming across diverse networks and devices, or struggle with real-time decoding and rendering on lightweight mobile platforms. To address these challenges, we introduce 4DGCPro, a novel hierarchical 4D Gaussian compression framework that facilitates real-time mobile decoding and high-quality rendering via progressive volumetric video streaming in a single bitstream. Specifically, we propose a perceptually-weighted and compression-friendly hierarchical 4D Gaussian representation with motion-aware adaptive grouping to reduce temporal redundancy, preserve coherence, and enable scalable multi-level detail streaming. Furthermore, we present an end-to-end entropy-optimized training scheme, which incorporates layer-wise rate-distortion (RD) supervision and attribute-specific entropy modeling for efficient bitstream generation. Extensive experiments show that 4DGCPro enables flexible quality and variable bitrate within a single model, achieving real-time decoding and rendering on mobile devices while outperforming existing methods in RD performance across multiple datasets.

PDF Details

JBHI Journal 2025 Journal Article

Fine-Grained Temporal Site Monitoring in EGD Streams via Visual Time-Aware Embedding and Vision-Text Asymmetric Coworking

Fang Peng
Hongkuan Shi
Shiquan He
Qiang Hu
Ting Li
Fan Huang
Xinxia Feng
Mei Liu

Esophagogastroduodenoscopy (EGD) requires inspecting plentiful upper gastrointestinal (UGI) sites completely for a precise cancer screening. Automated temporal site monitoring for EGD assistance is thus of high demand, yet often fails if directly applying the existing methods of online action detection. The key challenges are two-fold: 1) the global camera motion dominates, invalidating the temporal patterns derived from the object optical flows, and 2) the UGI sites are fine-grained, yielding highly homogenized appearances. In this paper, we propose an EGD-customized model, powered by two novel designs, i. e. , Visual Time-aware Embedding plus Vision-text Asymmetric Coworking (VTE+VAC), for real-time accurate fine-grained UGI site monitoring. Concretely, VTE learns visual embeddings by differentiating frames via classification losses, and meanwhile by reordering the sampled time-agnostic frames to be temporally coherent via a ranking loss. Such joint objective encourages VTE to capture the sequential relation without resorting to the inapplicable object optical flows, and thus to provide the time-aware frame-wise embeddings. In the subsequent analysis, VAC uses a temporal sliding window, and extracts vision-text multimodal knowledge from each frame and its corresponding textualized prediction via the learned VTE and a frozen BERT. The text embeddings help provide more representative cues, but also may cause misdirection due to prediction errors. Thus, VAC randomly drops or replaces historical predictions to increase the error tolerance to avoid collapsing onto the last few predictions. Qualitative and quantitative experiments demonstrate that the proposed method achieves superior performance compared to other state-of-the-art methods, with an average F1-score improvement of at least 7. 66%.

Details DOI

NeurIPS Conference 2025 Conference Paper

Long-tailed Recognition with Model Rebalancing

JIAAN LUO
Feng Hong
Qiang Hu
Xiaofeng Cao
Feng Liu
Jiangchao Yao

Long-tailed recognition is ubiquitous and challenging in deep learning and even in the downstream finetuning of foundation models, since the skew class distribution generally prevents the model generalization to the tail classes. Despite the promise of previous methods from the perspectives of data augmentation, loss rebalancing and decoupled training etc. , consistent improvement in the broad scenarios like multi-label long-tailed recognition is difficult. In this study, we dive into the essential model capacity impact under long-tailed context, and propose a novel framework, Model Rebalancing (MORE), which mitigates imbalance by directly rebalancing the model's parameter space. Specifically, MORE introduces a low-rank parameter component to mediate the parameter space allocation guided by a tailored loss and sinusoidal reweighting schedule, but without increasing the overall model complexity or inference costs. Extensive experiments on diverse long-tailed benchmarks, spanning multi-class and multi-label tasks, demonstrate that MORE significantly improves generalization, particularly for tail classes, and effectively complements existing imbalance mitigation methods. These results highlight MORE's potential as a robust plug-and-play module in long-tailed settings.

PDF Details

AAAI Conference 2025 Conference Paper

MonoBox: Tightness-Free Box-Supervised Polyp Segmentation Using Monotonicity Constraint

Qiang Hu
Zhenyu Yi
Ying Zhou
Fan Huang
Mei Liu
Qiang Li
Zhiwei Wang

We propose MonoBox, an innovative box-supervised segmentation method constrained by monotonicity to liberate its training from the user-unfriendly box-tightness assumption. In contrast to conventional box-supervised segmentation, where the box edges must precisely touch the target boundaries, MonoBox leverages imprecisely-annotated boxes to achieve robust pixel-wise segmentation. The 'linchpin' is that, within the noisy zones around box edges, MonoBox discards the traditional misguiding multiple-instance learning loss, and instead optimizes a carefully-designed objective, termed monotonicity constraint. Along directions transitioning from the foreground to background, this new constraint steers responses to adhere to a trend of monotonically decreasing values. Consequently, the originally unreliable learning within the noisy zones is transformed into a correct and effective monotonicity optimization. Moreover, an adaptive label correction is introduced, enabling MonoBox to enhance the tightness of box annotations using predicted masks from the previous epoch and dynamically shrink the noisy zones as training progresses. We verify MonoBox in the box-supervised segmentation task of polyps, where satisfying box-tightness is challenging due to the vague boundaries between the polyp and normal tissues. Experiments on both public synthetic and in-house real noisy datasets demonstrate that MonoBox exceeds other anti-noise state-of-the-arts by improving Dice by at least 5.5% and 3.3%, respectively.

PDF Details DOI

JBHI Journal 2025 Journal Article

TTFNet: Temporal-Frequency Features Fusion Network for Speech Based Automatic Depression Recognition and Assessment

Xiyuan Chen
Zhuhong Shao
Yinan Jiang
Runsen Chen
Yunlong Wang
Bicao Li
Mingyue Niu
Hongguang Chen

Related studies have revealed that the phonological features of depressed patients are different from those of healthy individuals. With the increasing prevalence of depression, an objective and convenient approach for early screening is necessary. To this end, we propose an automatic depression detection method based on hybrid speech features extracted by deep learning, dubbed as TTFNet. Firstly, to effectively excavate the intrinsic relationship among multidimensional dynamic features in the frequency domain, the log-Mel spectrogram of raw speech and its related derivatives are encoded into quaternion representation. Then, the innovatively designed quaternion VisionLSTM is utilized to capture their synergistic effects. Simultaneously, we integrate sLSTM with the pre-trained wav2vec 2. 0 model to fully acquire the temporal features. In addition, to further exploit the complementarity between temporal and frequency features, we design an XConformer block for cross-sequence interactions, which ingeniously combines self-attention mechanisms and convolutional modules. Based on this block, the dual-path fusion module closely utilizes the mutual promotion of features from different domains, thereby enhancing generalization capability of the proposed model. Extensive experiments conducted on the AVEC 2013, AVEC 2014, DAIC-WOZ and E-DAIC datasets demonstrate that our method outperforms current state-of-the-art methods in both depression recognition and severity prediction tasks.

Details DOI

AAAI Conference 2025 Conference Paper

VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

Qiang Hu
Houqiang Zhong
Zihan Zheng
Xiaoyun Zhang
Zhengxue Cheng
Li Song
Guangtao Zhai
Yanfeng Wang

Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression independently or focus on a single fixed rate-distortion (RD) tradeoff. In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance. Specifically, VRVVC introduces a compact tri-plane implicit residual representation for inter-frame modeling of long-duration dynamic scenes, effectively reducing temporal redundancy. We further propose a variable-rate residual representation compression scheme that leverages a learnable quantization and a tiny MLP-based entropy model. This approach enables variable bitrates through the utilization of predefined Lagrange multipliers to manage the quantization error of all latent representations. Finally, we present an end-to-end progressive training strategy combined with a multi-rate-distortion loss function to optimize the entire framework. Extensive experiments demonstrate that VRVVC achieves a wide range of variable bitrates within a single model and surpasses the RD performance of existing methods across various datasets.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Dynamic Feature Pruning and Consolidation for Occluded Person Re-identification

Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu

Occluded person re-identification (ReID) is a challenging problem due to contamination from occluders. Existing approaches address the issue with prior knowledge cues, such as human body key points and semantic segmentations, which easily fail in the presence of heavy occlusion and other humans as occluders. In this paper, we propose a feature pruning and consolidation (FPC) framework to circumvent explicit human structure parsing. The framework mainly consists of a sparse encoder, a multi-view feature mathcing module, and a feature consolidation decoder. Specifically, the sparse encoder drops less important image tokens, mostly related to background noise and occluders, solely based on correlation within the class token attention. Subsequently, the matching stage relies on the preserved tokens produced by the sparse encoder to identify k-nearest neighbors in the gallery by measuring the image and patch-level combined similarity. Finally, we use the feature consolidation module to compensate pruned features using identified neighbors for recovering essential information while disregarding disturbance from noise and occlusion. Experimental results demonstrate the effectiveness of our proposed framework on occluded, partial, and holistic Re-ID datasets. In particular, our method outperforms state-of-the-art results by at least 8.6% mAP and 6.0% Rank-1 accuracy on the challenging Occluded-Duke dataset.

PDF Details DOI

EAAI Journal 2023 Journal Article

A property perceived service quality evaluation method for public buildings based on multisource heterogeneous information fusion

Wenjin Zuo
Lijun Liu
Qiang Hu
Shouzhen Zeng
Zhiming Hu

Details DOI