Author name cluster

Weiqi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2026 Conference Paper

360Explorer: Exploring 4D Controllable World in Panoramic Videos

Xinhua Cheng
Haiyang Zhou
Wangbo Yu
Tanghui Jia
Bin Lin
Yunyang Ge
Weiqi Li
Li Yuan

We present 360Explorer, a novel approach for generating 4D controllable panoramic videos conditioned on user-provided 3D instructions for exploring and manipulating dynamic worlds. Compared to existing perspective-based methods struggle to address spatial consistency during camera rotation in place, we introduce the panoramic view in controllable video generation models to inherently maintain the view recall consistency. By introducing dynamic point clouds as the 4D scene representations, 360Explorer unifies the modeling of camera transformations and object movements as incomplete renders to describe precise control instructions in 3D worlds. To tackle the data limitation in acquiring multi-viewpoint panoramic videos, we further propose a reverse warping strategy to construct the training dataset on easily accessible monocular panoramic videos. Extensive experiments demonstrate that 360Explorer achieves superior performance in creating 4D controllable panoramic videos with camera transformation and object movements aligned with diverse provided instructions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning

Xuanyu Zhang
Weiqi Li
Shijie Zhao
Junlin Li
Li Zhang
Jian Zhang

Recent advances in AI-generated content (AIGC) have led to the emergence of powerful text-to-video generation models. Despite these successes, evaluating the quality of AIGC-generated videos remains challenging due to limited generalization, lack of temporal awareness, heavy reliance on large-scale annotated datasets, and the lack of effective interaction with generation models. Most current approaches rely on supervised fine-tuning of vision-language models (VLMs), which often require large-scale annotated datasets and tend to decouple understanding and generation. To address these shortcomings, we propose VQ-Insight, a novel reasoning-style VLM framework for AIGC video quality assessment. Our approach features: (1) a progressive video quality learning scheme that combines image quality warm-up, general task-specific temporal learning, and joint optimization with the video generation model; (2) the design of multi-dimension scoring rewards, preference comparison rewards, and temporal modeling rewards to enhance both generalization and specialization in video quality evaluation. Extensive experiments demonstrate that VQ-Insight consistently outperforms state-of-the-art baselines in preference comparison, multi-dimension scoring, and natural video scoring, bringing significant improvements for video generation tasks.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

AlignedGen: Aligning Style Across Generated Images

Jiexuan Zhang
Yiheng Du
Qian Wang
Weiqi Li
Yu Gu
Jian Zhang

Diffusion-based generative models struggle to maintain high style consistency across generated images via text description. Although several style-aligned image generation methods have been proposed to address this issue, they exhibit suboptimal performance and are primarily built upon the U-Net architecture, limiting their compatibility with DiT diffusion models like Flux that has emerged as a predominant model in the field of image generation. To address these limitations, we propose AlignedGen, a novel training-free style-aligned image generation method for DiT models to significantly enhance style consistency across generated images. Specifically, AlignedGen incorporates two key components to achieve this: Shifted Position Embedding (ShiftPE) and Advanced Attention Sharing (AAS). ShiftPE alleviates the text controllability degradation observed in prior methods when applied to DiT models through its non-overlapping position indices design, while AAS comprises three specialized techniques to unleash the full potential of DiT for style-aligned generation. Furthermore, to broaden the applicability of our method, we present an efficient query, key, and value feature extraction algorithm, enabling our method to seamlessly incorporate external images as style references. Extensive experimental results validate that our method effectively enhances style consistency across generated images while maintaining favorable text controllability. Code: https: //github. com/Jiexuanz/AlignedGen.

PDF Details

NeurIPS Conference 2025 Conference Paper

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

Weiqi Li
Xuanyu Zhang
Shijie Zhao
Yabin ZHANG
Junlin Li
Li Zhang
Jian Zhang

Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. The rapid advancement of multi-modal large language models (MLLMs) has significantly broadened the scope of IQA, moving toward comprehensive image quality understanding that incorporates content analysis, degradation perception, and comparison reasoning beyond mere numerical scoring. Previous MLLM-based methods typically either generate numerical scores lacking interpretability or heavily rely on supervised fine-tuning (SFT) using large-scale annotated datasets to provide descriptive assessments, limiting their flexibility and applicability. In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. By jointly optimizing score regression and degradation perception tasks with carefully designed reward functions, our approach effectively exploits their mutual benefits for enhanced performance. Extensive experiments demonstrate that Q-Insight substantially outperforms existing state-of-the-art methods on both score regression and degradation perception tasks, while exhibiting impressive zero-shot generalization and superior comparison reasoning capability. The code and models are available at https: //github. com/bytedance/Q-Insight.

PDF Details

IROS Conference 2024 Conference Paper

Automatic Field of View Adjustment of an RCM Constraint-Free Continuum Laparoscopic Robot

Jing Zhang 0109
Baichuan Wang
Zhijie Pan
Weiqi Li
Mengtang Li

Automatic laparoscopic field of view (FOV) adjustment can effectively assist surgeons in minimally invasive surgery (MIS). However, existing work based on rod-shaped laparoscopes is inevitably constrained by the remote center of motion (RCM) during the process of FOV adjustment. The RCM limits laparoscopic movement and makes modeling and control more complex. This paper proposes a novel tendon-driven continuum laparoscope that is not affected by the RCM constraint. Furthermore, an automatic FOV adjustment method is designed for the proposed laparoscope robot, which considers the surgical instrument position and size in the image, as well as eye-hand consistency. Two simulation platforms are developed using MATLAB and Webots to intuitively study and optimize the proposed adjustment method. The convergence time of surgical tool tracking with a complex 3D trajectory is only 1s, the average tracking error after stabilization is about 9. 97 pixels, and the maximum eye-hand error is only 0. 04°. A first-generation prototype is built to verify the tracking performance of the proposed tendon-driven continuum laparoscope. The experimental results show that the proposed system can perform real-time laparoscopic FOV adjustment without being constrained by the RCM.

Details

AAAI Conference 2024 Conference Paper

Long-Tailed Learning as Multi-Objective Optimization

Weiqi Li
Fan Lyu
Fanhua Shang
Liang Wan
Wei Feng

Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models biased towards classes with sufficient samples and performing poorly on rare classes. Recent methods propose to rebalance classes but they undertake the seesaw dilemma (what is increasing performance on tail classes may decrease that of head classes, and vice versa). In this paper, we argue that the seesaw dilemma is derived from the gradient imbalance of different classes, in which gradients of inappropriate classes are set to important for updating, thus prone to overcompensation or undercompensation on tail classes. To achieve ideal compensation, we formulate long-tailed recognition as a multi-objective optimization problem, which fairly respects the contributions of head and tail classes simultaneously. For efficiency, we propose a Gradient-Balancing Grouping (GBG) strategy to gather the classes with similar gradient directions, thus approximately making every update under a Pareto descent direction. Our GBG method drives classes with similar gradient directions to form a more representative gradient and provides ideal compensation to the tail classes. Moreover, we conduct extensive experiments on commonly used benchmarks in long-tailed learning and demonstrate the superiority of our method over existing SOTA methods. Our code is released at https://github.com/WickyLee1998/GBG_v1.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Cross-Image Context for Single Image Inpainting

Tingliang Feng
Wei Feng
Weiqi Li
Di Lin

Visual context is of crucial importance for image inpainting. The contextual information captures the appearance and semantic correlation between the image regions, helping to propagate the information of the complete regions for reasoning the content of the corrupted regions. Many inpainting methods compute the visual context based on the regions within the single image. In this paper, we propose the Cross-Image Context Memory (CICM) for learning and using the cross-image context to recover the corrupted regions. CICM consists of multiple sets of the cross-image representations learned from the image regions with different visual patterns. The regional representations are learned across different images, thus providing richer context that benefit the inpainting task. The experimental results demonstrate the effectiveness and generalization of CICM, which achieves state-of-the-art performances on various datasets for single image inpainting.

PDF Details