Author name cluster

Yuqing Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

EAAI Journal 2026 Journal Article

Corrigendum to “MambaRSIS: Context-aware multi-scale feature aggregation with selective state space model for remote sensing instance segmentation” [Eng. Appl. Artif. Intel. 160 (2025) EAAI_111993]

Liyuan Pan
Xu Cao
Huanxin Zou
Hao Chen
Shitian He
Yuqing Zhang
Xuanming Liu
Jiangshan Li

Details DOI

AAAI Conference 2026 Conference Paper

JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction

Yuhao Zhan
Yuqing Zhang
Jing Yuan
Qixiang Ma
Zhiqi Yang
Yu Gu
Zemin Liu
Fei Wu

Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. To address this issue, we introduce the Judge of Edit-Level Validity (JELV), an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90% agreement with human annotators, and a distilled DeBERTa classifier with 85% precision on valid edits. We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments. We also apply JELV to filter LLM-generated correction candidates, expanding the BEA19's single-reference dataset containing 38,692 source sentences. Retraining top GEC systems on this expanded dataset yields measurable performance gains. JELV provides a scalable solution for enhancing reference diversity and strengthening both evaluation and model generalization.

PDF Details DOI

EAAI Journal 2025 Journal Article

MambaRSIS: Context-aware multi-scale feature aggregation with selective state space model for remote sensing instance segmentation

Liyuan Pan
Xu Cao
Huanxin Zou
Hao Chen
Shitian He
Yuqing Zhang
Xuanming Liu
Jiangshan Li

Details DOI

NeurIPS Conference 2025 Conference Paper

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Zhaorun Chen
Zichen Wen
Yichao Du
Yiyang Zhou
Chenhang Cui
Siwei Han
Jen Weng
Chaoqi Wang

While text-to-image models like GPT-4o-Image and FLUX are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes. To address this issue, we introduce MJ-Bench, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across six key perspectives: alignment, safety, image quality, bias, composition, and visualization. Specifically, we evaluate a large variety of multimodal judges including smaller-sized CLIP-based scoring models, open-source VLMs, and close-source VLMs on each decomposed subcategory of our preference dataset. Experiments reveal that close-source VLMs generally provide better feedback, with GPT-4o outperforming other judges in average. Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities. Further studies in feedback scale reveal that VLM judges can generally provide more accurate and stable feedback in natural language than numerical scales. Notably, human evaluations on end-to-end and fine-tuned models using separate feedback from these multimodal judges provide similar conclusions, further confirming the effectiveness of MJ-Bench.

PDF Details

NeurIPS Conference 2025 Conference Paper

MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study

Yuqing Zhang
Yue Han
Shuanghe Zhu
Haoxiang Wu
Hangqi Li
Shengyu Zhang
Junchi Yan
Zemin Liu

Analyzing ancient manuscripts has traditionally been a labor-intensive and time-consuming task for philologists. While recent advancements in LMMs have demonstrated their potential across diverse domains, their effectiveness in manuscript study remains underexplored. In this paper, we introduce MS-Bench, the first comprehensive benchmark co-developed with archaeologists, comprising 5, 076 high-resolution images from 4th to 14th century and 9, 982 expert-curated questions across nine sub-tasks aligned with archaeological workflows. Through four prompting strategies, we systematically evaluate 32 LMMs on their effectiveness, robustness, and cultural contextualization. Our analysis reveals scale-driven performance and reliability improvements, prompting strategies' impact on performance (CoT has two-sides effect, while visual retrieval-augmented prompts provide consistent boost), and task-specific preferences depending on LMM’s visual capabilities. Although current LMMs are not yet capable of replacing domain expertise, they demonstrate promising potential to accelerate manuscript research through future human–AI collaboration.

PDF Details

YNIMG Journal 2024 Journal Article

Relaxometry network based on MRI R2⁎ mapping revealing brain iron accumulation patterns in Parkinson's disease

Weizhao Lu
Tianbin Song
Zhenxiang Zang
Jiping Li
Yuqing Zhang
Jie Lu

Details DOI