Author name cluster

Shi Qiu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

EAAI Journal 2026 Journal Article

Knowledge graph-based operation and maintenance risk analysis and early warning approach for railway traction power supply systems

Shi Qiu
Xiaojian Li
Yongjun Chen
Weidong Wang
Jin Wang
Runan Cheng
Qasim Zaheer

Details DOI

EAAI Journal 2026 Journal Article

Multimodal graph neural network framework for railway fastener tightness assessment from high-resolution point clouds

Qasim Zaheer
S Muhammad Ahmed Hassan Shah
Weidong Wang
Haleema Ehsan
Chengbo Ai
Jin Wang
Shi Qiu

Details DOI

TIST Journal 2025 Journal Article

A Comprehensive Overview of Large Language Models

Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multimodal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to provide not only a systematic survey but also a quick, comprehensive reference for the researchers and practitioners to draw insights from extensive, informative summaries of the existing works to advance the LLM research.

Details DOI

NeurIPS Conference 2025 Conference Paper

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Runsong Zhu
Ka-Hei Hui
Zhengzhe Liu
Qianyi Wu
Weiliang Tang
Shi Qiu
Pheng-Ann Heng
Chi-Wing Fu

Open-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a novel instance-to-language feature mapping and designing an efficient two-stage training strategy. During inference, to bridge distinct characteristics of the two fields, we further design an adaptive language-to-instance prompt refinement, promoting high-quality prompt-segmentation inference. Extensive experiments not only demonstrate COS3D's leading performance over existing methods on two widely-used benchmarks but also show its high potential to various applications, ~\ie, novel image-based 3D segmentation, hierarchical segmentation, and robotics.

PDF Details

IROS Conference 2025 Conference Paper

Gaussian Splatting with Reflectance Regularization for Endoscopic Scene Reconstruction

Chengkun Li
Kai Chen 0028
Shi Qiu
Jason Ying-Kuen Chan
Qi Dou 0001

Endoscopic reconstruction plays a crucial role in surgical robotics. The dynamic lighting conditions and integrated camera-light source in endoscopic scenes create a distinct reconstruction challenge: shape ambiguity. To mitigate this, we propose a Gaussian Splatting (GS) based framework for endoscopic scene reconstruction, enhanced with reflectance regularization. We embed every 3D Gaussian point with physical reflective attributes and combine this representation with a physically based inverse rendering framework. By jointly training 3DGS for view synthesis with this reflectance regularization, we are able to attain high-quality geometry without changing the volume rendering pipeline. Our experiments demonstrate the superiority in both geometry representation and rendering performance compared to existing GS approaches, making it a practical solution for endoscopic applications. Project is available at: https://med-air.github.io/GSR2.

Details

NeurIPS Conference 2025 Conference Paper

MJ-Video: Benchmarking and Rewarding Video Generation with Fine-Grained Video Preference

Haibo Tong
Zhaoyang Wang
Zhaorun Chen
Haonian Ji
Shi Qiu
Siwei Han
Kexin Geng
Zhongkai Xue

Recent advancements in video generation have significantly improved the ability to synthesize videos from text instructions. However, existing models still struggle with key challenges such as instruction misalignment, content hallucination, safety concerns, and generation bias. To address these limitations, we introduce MJ-BENCH-VIDEO, a large-scale video preference benchmark designed to evaluate video generation across five critical aspects: Alignment, Safety, Fineness, Coherence & Consistency, and Bias & Fairness. This benchmark further incorporates 28 fine-grained criteria to provide a comprehensive evaluation of video preference. Building upon this dataset, we propose MJ-VIDEO, a Mixture-of-Experts (MoE)-based video reward model designed to deliver fine-grained reward. MJ-VIDEO can dynamically select relevant experts to accurately judge the preference based on the input text-video pair. This architecture enables more precise and adaptable preference judgments. Through extensive benchmarking on MJ-BENCH-VIDEO, we analyze the limitations of existing video reward models and demonstrate the superior performance of MJ-VIDEO in video preference assessment, achieving 17. 58% and 15. 87% improvements in overall and fine-grained preference judgments, respectively. Additionally, MJ-VIDEO is able to improve the alignment performance in video generation via preference fine-tuning.

PDF Details

NeurIPS Conference 2025 Conference Paper

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yunbo Sun
Zeyu Cai
Jiashen Wei
Tianyu Luo
Yixuan Yin

Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2. 5 Pro, achieves only 36. 9\% accuracy compared to human experts' 61. 9\%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204\% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https: //www. phybench. cn/.

PDF Details

NeurIPS Conference 2025 Conference Paper

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Xeron Du
Yifan Yao
Kaijing Ma
Bingli Wang
Tianyu Zheng
Minghao Liu
Yiming Liang
Xiaolong Jin

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e. g. , the reasoning-focused model Gemini-2. 5-Pro achieved the highest accuracy of 63. 56% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

PDF Details

EAAI Journal 2024 Journal Article

Automated detection and quantification of pavement cracking around manhole

Jun Peng
Weidong Wang
Wenbo Hu
Chengbo Ai
Xinyue Xu
Youyin Shi
Jin Wang
Zhifa Ran

Details DOI