Arrow Research search

Author name cluster

Wenqi Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

AAAI Conference 2026 Conference Paper

TIME: Temporal-Sensitive Multi-Dimensional Instruction Tuning and Robust Benchmarking for Video-LLMs

  • Yunxiao Wang
  • Meng Liu
  • Wenqi Liu
  • Xuemeng Song
  • Bin Wen
  • Fan Yang
  • Tingting Gao
  • Di Zhang

Video large language models have achieved remarkable performance in tasks such as video question answering, however, their temporal understanding remains suboptimal. To address this limitation, we curate a dedicated instruction fine-tuning dataset that focuses on enhancing temporal comprehension across five key dimensions. In order to reduce reliance on costly temporal annotations, we introduce a multi-task prompt fine-tuning approach that seamlessly integrates temporal-sensitive tasks into existing instruction datasets without requiring additional annotations. Furthermore, we develop a novel benchmark for temporal-sensitive video understanding that not only fills the gaps in dimension coverage left by existing benchmarks but also rigorously filters out potential shortcuts, ensuring a more accurate evaluation. Extensive experimental results demonstrate that our approach significantly enhances the temporal understanding of video-LLMs while avoiding reliance on shortcuts.

IROS Conference 2025 Conference Paper

GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps

  • Yan Rui Tan
  • Wenqi Liu
  • Wai Lun Leong
  • John Guan Zhong Tan
  • Wayne Wen Huei Yong
  • Shaohui Foong
  • Fan Shi
  • Rodney Swee Huat Teo

Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo-3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones—six physical and three virtual— in a forest environment.

NeurIPS Conference 2025 Conference Paper

Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

  • Wenqi Liu
  • Xuemeng Song
  • Jiaxi Li
  • Yinwei Wei
  • Na Zheng
  • Jianhua Yin
  • Liqiang Nie

Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i. e. , response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs. Comprehensive evaluation across five benchmarks demonstrate SymMPO's superior performance, validating its effectiveness in hallucination mitigation of MLLMs.

TIST Journal 2025 Journal Article

Multimodal Large Language Model with LoRA Fine-Tuning for Multimodal Sentiment Analysis

  • Jie Mu
  • Wei Wang
  • Wenqi Liu
  • Tiantian Yan
  • Guanglu Wang

Multimodal sentiment analysis has become a popular research topic in recent years. However, existing methods have two unaddressed limitations: (1) they use limited supervised labels to train models, which makes it impossible for model to fully learn sentiments in different modal data; (2) they employ text and image pre-trained models trained in different unimodal tasks to extract different modal features, so that the extracted features cannot take into account the interactive information between image and text. To solve these problems, in this paper we propose a Vision-Language Contrastive Learning network (VLCLNet). First, we introduce a pre-trained Large Language Model (LLM), which is trained from vast quantities of multimodal data, has better understanding ability for image and text contents, thus being effectively applied to different tasks while requiring few amount of labelled training data. Second, we adapt a Multimodal Large Language Model (MLLM), BLIP-2 (Bootstrapping Language-Image Pre-training) network, to extract multimodal fusion feature. Such MLLM can fully consider the correlation between images and texts when extracting features. In addition, due to the discrepancy between the pre-training task and the sentiment analysis task, the pre-trained model will output the suboptimal prediction results. We use Low-Rank Adaptation (LoRA) fine-tuning strategy to update the model parameters on sentiment analysis task, which avoids the issue of inconsistent task between pre-training task and downstream task. Experiments verify that the proposed VLCLNet is superior to other strong baselines.