Author name cluster

Zhenming Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2026 Conference Paper

Firing Bits Where It Matters: Spiking-Guided Just Recognizable Distortion Modeling for Machine-Centric Video Coding

Wuyuan Xie
Zhenming Li
Yuwu Lu
Di Lin
Yun Song
Miaohui Wang

Just recognizable distortion (JRD) has emerged as a promising paradigm for machine-centric video coding. However, existing JRD-guided coding methods are limited by coarse annotation granularity and high computational cost, which hinder their deployment. In this paper, we first investigate the impact of different JRD annotation strategies on downstream task performance. By incorporating both instance-level and contextual information, we construct a new JRD dataset with fine-grained annotations compatible with object detection and instance segmentation tasks. To enhance quantization parameter (QP) map prediction while maintaining computational efficiency, we propose a novel spiking neural network (SNN)-based framework that decomposes video frames into spatial structures, channel interactions, and temporal patterns. Furthermore, we introduce a spiking attention mechanism to aggregate task-relevant features and employ adaptive scaling vectors to suppress machine-perceived redundancy, enabling targeted bitrate allocation aligned with task-critical content. Extensive experiments on multiple datasets and backbones demonstrate that our approach consistently outperforms state-of-the-art codec-based and JRD-guided methods in maintaining task performance at ultra-low bitrates, while significantly reducing computational overhead.

PDF Details DOI

AAAI Conference 2026 Conference Paper

The Last Byte: Learning Just Enough for Machine-Oriented Image Compression

Wuyuan Xie
Zhenming Li
Ye Liu
Jian Jin
Yun Song
Miaohui Wang

Just recognizable distortion (JRD) has been introduced for image compression for machines, aiming to quantify the maximum coding distortion that can be tolerated by a specific perception model, thereby defining the upper bound of machine vision redundancy (MVR). However, existing JRD-based redundancy estimation methods face three key challenges: limited dataset annotation accuracy, low prediction efficiency, and insufficient perception accuracy, all of which hinder their practical deployment. To address these limitations, we propose a new MVR-Net, a frame-wise efficient JRD prediction method that generates the optimal encoding quantization map in a single inference pass. Furthermore, we refine the annotation standard for JRD datasets based on experimental insights, enhancing the precision of recognizable redundancy measurement. Compared to stateof-the-art methods, MVR-Net achieves a superior balance between bitrate reduction and perception accuracy in JRD-guided compression, while offering up to a 40,000× speed improvement, demonstrating its practicality and efficiency for real-world applications.

PDF Details DOI

AAAI Conference 2025 Conference Paper

DDJND: Dual Domain Just Noticeable Difference in Multi-Source Content Images with Structural Discrepancy

Miaohui Wang
Zhenming Li
Wuyuan Xie

Most existing just noticeable difference (JND) methods primarily integrate specific masking effects in a single domain. However, these single-domain JND methods struggle with the structural discrepancies in multi-source content images, limiting their effectiveness in visual redundancy estimation. To address this issue, we propose a dual domain encoder that combines spatial and frequency features to comprehensively capture visual patterns. Our design includes spatial pattern balance and frequency detail correction modules to balance global and local patterns and correct low- and high-frequency distributions. Additionally, we develop a dual domain decoder to effectively extract multi-scale pattern redundancies and integrate them with detail redundancies in the frequency domain. Experiments demonstrate the effectiveness and robustness of our proposed method in handling structural discrepancies in multi-source content images.

PDF Details DOI