Arrow Research search

Author name cluster

Leida Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

AAAI Conference 2026 Conference Paper

Fine-grained Image Quality Assessment for Perceptual Image Restoration

  • Xiangfei Sheng
  • Xiaofeng Pan
  • Zhichao Yang
  • Pengfei Chen
  • Leida Li

Recent years have witnessed remarkable achievements in perceptual image restoration (IR), creating an urgent demand for accurate image quality assessment (IQA), which is essential for both performance comparison and algorithm optimization. Unfortunately, the existing IQA metrics exhibit inherent weakness for IR task, particularly when distinguishing fine-grained quality differences among restored images. To address this dilemma, we contribute the first-of-its-kind fine-grained image quality assessment dataset for image restoration, termed FGRestore, comprising 18,408 restored images across six common IR tasks. Beyond conventional scalar quality scores, FGRestore was also annotated with 30,886 fine-grained pairwise preferences. Based on FGRestore, a comprehensive benchmark was conducted on the existing IQA metrics, which reveal significant inconsistencies between score-based IQA evaluations and the fine-grained restoration quality. Motivated by these findings, we further propose FGResQ, a new IQA model specifically designed for image restoration, which features both coarse-grained score regression and fine-grained quality ranking. Extensive experiments and comparisons demonstrate that FGResQ significantly outperforms state-of-the-art IQA metrics.

AAAI Conference 2026 Conference Paper

LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations

  • Zhichao Yang
  • Tianjiao Gu
  • Jianjie Wang
  • Feiyu Lin
  • Xiangfei Sheng
  • Pengfei Chen
  • Leida Li

The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt scenarios. However, the existing T2I alignment benchmarks predominantly focus on short prompt scenarios and only provide MOS or Likert scale annotations. This inherent limitation hinders the development of long T2I evaluators, particularly in terms of the interpretability of alignment. In this study, we contribute LongT2IBench, which comprises 14K long text-image pairs accompanied by graph-structured human annotations. Given the detail-intensive nature of long prompts, we first design a Generate-Refine-Qualify annotation protocol to convert them into textual graph structures that encompass entities, attributes, and relations. Through this transformation, fine-grained alignment annotations are achieved based on these granular elements. Finally, the graph-structed annotations are converted into alignment scores and interpretations to facilitate the design of T2I evaluation models. Based on LongT2IBench, we further propose LongT2IExpert, a LongT2I evaluator that enables multi-modal large language models (MLLMs) to provide both quantitative scores and structured interpretations through an instruction-tuning process with Hierarchical Alignment Chain-of-Thought (CoT). Extensive experiments and comparisons demonstrate the superiority of the proposed LongT2IExpert in alignment evaluation and interpretation.

AAAI Conference 2026 Conference Paper

TuningIQA: Fine-Grained Blind Image Quality Assessment for Livestreaming Camera Tuning

  • Xiangfei Sheng
  • Zhichao Duan
  • Xiaofeng Pan
  • Yipo Huang
  • Zhichao Yang
  • Pengfei Chen
  • Leida Li

Livestreaming has become increasingly prevalent in modern visual communication, where automatic camera quality tuning is essential for delivering superior user Quality of Experience (QoE). Such tuning requires accurate blind image quality assessment (BIQA) to guide parameter optimization decisions. Unfortunately, the existing BIQA models typically only predict an overall coarse-grained quality score, which cannot provide fine-grained perceptual guidance for precise camera parameter tuning. To bridge this gap, we first establish FGLive-10K, a comprehensive fine-grained BIQA database containing 10,185 high-resolution images captured under varying camera parameter configurations across diverse livestreaming scenarios. The dataset features 50,925 multi-attribute quality annotations and 19,234 fine-grained pairwise preference annotations. Based on FGLive-10K, we further develop TuningIQA, a fine-grained BIQA metric for livestreaming camera tuning, which integrates human-aware feature extraction and graph-based camera parameter fusion. Extensive experiments and comparisons demonstrate that TuningIQA significantly outperforms state-of-the-art BIQA methods in both score regression and fine-grained quality ranking, achieving superior performance when deployed for livestreaming camera tuning.

AAAI Conference 2025 Conference Paper

Asymmetric Hierarchical Difference-aware Interaction Network for Event-guided Motion Deblurring

  • Wen Yang
  • Jinjian Wu
  • Leida Li
  • Weisheng Dong
  • Guangming Shi

Event cameras are bio-inspired sensors that are capable of capturing motion information with high temporal resolution, which show potential in aiding image motion deblurring recently. Most existing methods indiscriminately handle feature fusion of two modalities with symmetric unidirectional/bidirectional interactions at different-level layers in feature encoder, while ignoring the different dependencies between cross-modal hierarchical features. To tackle these limitations, we propose a novel Asymmetric Hierarchical Difference-aware Interaction Network (AHDINet) for event-based motion deblurring, which explores the complementarity of two modalities with differential dependence modeling of cross-modal hierarchical features. Thereby, an event-assisted edge complement module is designed to leverage event modality to enhance the edge details of the image features in low-level encoder stage, and an image-assisted semantic complement module is developed to transfer contextual semantics of image features to event branch in high-level encoder stage. Benefiting from the proposed differentiated interaction mode, the respective advantages of image and event modalities are fully exploited. Extensive experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance.

IJCAI Conference 2025 Conference Paper

Towards Region-Adaptive Feature Disentanglement and Enhancement for Small Object Detection

  • Yanchao Bi
  • Yang Ning
  • Xiushan Nie
  • Xiankai Lu
  • Yongshun Gong
  • Leida Li

Current feature fusion strategies often fail to adequately account for the influence of activation intensity across different scales on small object features, which impedes the effective detection of small objects. To address this limitation, we propose the Region-Adaptive Feature Disentanglement and Enhancement (RAFDE) strategy, which improves both downsampling and feature fusion by leveraging activation intensity variations at multiple scales. First, we introduce the Boundary Transitional Region-enhanced Downsampling (BTRD) module, which enhances boundary transitional regions containing both strongly and weakly activated features, thereby mitigating the loss of crucial boundary information for small objects. Second, we present the Regional-Adaptive Feature Fusion (RAFF) module, which adaptively disentangles and fuses co-activated and uni-activated regions from adjacent levels into the current level, effectively reducing the risk of small objects being overwhelmed. Extensive experiments on several public datasets demonstrate that the RAFDE strategy is highly effective and outperforms state-of-the-art methods. The code is available at https: //github. com/b-yanchao/RAFDE. git.

NeurIPS Conference 2025 Conference Paper

Towards Syn-to-Real IQA: A Novel Perspective on Reshaping Synthetic Data Distributions

  • Aobo Li
  • Jinjian Wu
  • Yongxu Liu
  • Leida Li
  • Weisheng Dong

Blind Image Quality Assessment (BIQA) has advanced significantly through deep learning, but the scarcity of large-scale labeled datasets remains a challenge. While synthetic data offers a promising solution, models trained on existing synthetic datasets often show limited generalization ability. In this work, we make a key observation that representations learned from synthetic datasets often exhibit a discrete and clustered pattern that hinders regression performance: features of high-quality images cluster around reference images, while those of low-quality images cluster based on distortion types. Our analysis reveals that this issue stems from the distribution of synthetic data rather than model architecture. Consequently, we introduce a novel framework SynDR-IQA, which reshapes synthetic data distribution to enhance BIQA generalization. Based on theoretical derivations of sample diversity and redundancy's impact on generalization error, SynDR-IQA employs two strategies: distribution-aware diverse content upsampling, which enhances visual diversity while preserving content distribution, and density-aware redundant cluster downsampling, which balances samples by reducing the density of densely clustered areas. Extensive experiments across three cross-dataset settings (synthetic-to-authentic, synthetic-to-algorithmic, and synthetic-to-synthetic) demonstrate the effectiveness of our method. The code is available at https: //github. com/Li-aobo/SynDR-IQA.

AAAI Conference 2024 Conference Paper

Motion Deblurring via Spatial-Temporal Collaboration of Frames and Events

  • Wen Yang
  • Jinjian Wu
  • Jupo Ma
  • Leida Li
  • Guangming Shi

Motion deblurring can be advanced by exploiting informative features from supplementary sensors such as event cameras, which can capture rich motion information asynchronously with high temporal resolution. Existing event-based motion deblurring methods neither consider the modality redundancy in spatial fusion nor temporal cooperation between events and frames. To tackle these limitations, a novel spatial-temporal collaboration network (STCNet) is proposed for event-based motion deblurring. Firstly, we propose a differential-modality based cross-modal calibration strategy to suppress redundancy for complementarity enhancement, and then bimodal spatial fusion is achieved with an elaborate cross-modal co-attention mechanism to weight the contributions of them for importance balance. Besides, we present a frame-event mutual spatio-temporal attention scheme to alleviate the errors of relying only on frames to compute cross-temporal similarities when the motion blur is significant, and then the spatio-temporal features from both frames and events are aggregated with the custom cross-temporal coordinate attention. Extensive experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance. Project website: https://github.com/wyang-vis/STCNet.

AAAI Conference 2022 Conference Paper

Robust Depth Completion with Uncertainty-Driven Loss Functions

  • Yufan Zhu
  • Weisheng Dong
  • Leida Li
  • Jinjian Wu
  • Xin Li
  • Guangming Shi

Recovering a dense depth image from sparse LiDAR scans is a challenging task. Despite the popularity of color-guided methods for sparse-to-dense depth completion, they treated pixels equally during optimization, ignoring the uneven distribution characteristics in the sparse depth map and the accumulated outliers in the synthesized ground truth. In this work, we introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion. Specifically, we propose an explicit uncertainty formulation for robust depth completion with Jeffrey’s prior. A parametric uncertain-driven loss is introduced and translated to new loss functions that are robust to noisy or missing data. Meanwhile, we propose a multiscale joint prediction model that can simultaneously predict depth and uncertainty maps. The estimated uncertainty map is also used to perform adaptive prediction on the pixels with high uncertainty, leading to a residual map for refining the completion results. Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.

IJCAI Conference 2019 Conference Paper

Incremental Few-Shot Learning for Pedestrian Attribute Recognition

  • Liuyu Xiang
  • Xiaoming Jin
  • Guiguang Ding
  • Jungong Han
  • Leida Li

Pedestrian attribute recognition has received increasing attention due to its important role in video surveillance applications. However, most existing methods are designed for a fixed set of attributes. They are unable to handle the incremental few-shot learning scenario, i. e. adapting a well-trained model to newly added attributes with scarce data, which commonly exists in the real world. In this work, we present a meta learning based method to address this issue. The core of our framework is a meta architecture capable of disentangling multiple attribute information and generalizing rapidly to new coming attributes. By conducting extensive experiments on the benchmark dataset PETA and RAP under the incremental few-shot setting, we show that our method is able to perform the task with competitive performances and low resource requirements.