Arrow Research search

Author name cluster

Wang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2026 Conference Paper

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

  • Yongdong Luo
  • Wang Chen
  • Weizhong Huang
  • Shukang Yin
  • Haojia Lin
  • Jinfa Huang
  • Chaoyou Fu
  • Jiayi Ji

Recent advances in long video understanding typically mitigate visual redundancy through visual token pruning based on attention distribution. However, while existing methods employ post-hoc low-response token pruning in decoder layers, they overlook the input-level semantic correlation between visual tokens and instructions (query). In this paper, we propose QuoTA, an ante-hoc training-free modular that extends existing large video-language models (LVLMs) for visual token assignment based on query-oriented frame-level importance assessment. The query-oriented token selection is crucial as it aligns visual processing with task-specific requirements, optimizing token budget utilization while preserving semantically relevant content. Specifically, (i) QuoTA strategically allocates frame-level importance scores based on query relevance, enabling one-time visual token assignment before cross-modal interactions in decoder layers, (ii) we decouple the query through Chain-of-Thoughts reasoning to facilitate more precise LVLM-based frame importance scoring, and (iii) QuoTA offers a plug-and-play functionality that extends to existing LVLMs. Extensive experimental results demonstrate that implementing QuoTA with LLaVA-Video-7B yields an average performance improvement of 3.2% across six benchmarks (including Video-MME and MLVU) while operating within an identical visual token budget as the baseline.

ICLR Conference 2025 Conference Paper

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

  • Xin Mao
  • Huimin Xu
  • Feng-Lin Li
  • Ziqi Jin
  • Wang Chen
  • Wei Zhang 0218
  • Anh Tuan Luu

Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO), eliminating the need for reward models and online sampling. Despite these benefits, DPO and its variants remain sensitive to hyper-parameters and prone to instability, particularly on mathematical datasets. We argue that these issues arise from the unidirectional likelihood-derivative negative feedback inherent in the log-likelihood loss function. To address this, we propose a novel LLM alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses and does not require any extra tunable hyper-parameters or pairwise preference data, streamlining the alignment pipeline to be as simple as supervised fine-tuning. We conduct extensive experiments across two challenging QA benchmarks and four reasoning benchmarks. The experimental results show that BNF achieves comparable performance to the best methods on QA benchmarks, while its performance decrease on the four reasoning benchmarks is significantly lower compared to the best methods, thus striking a better balance between value alignment and reasoning ability. In addition, we further validate the performance of BNF on non-pairwise datasets, and conduct in-depth analysis of log-likelihood and logit shifts across different preference optimization methods. We will release all the source code, checkpoints, and datasets on GitHub.

ICRA Conference 2024 Conference Paper

Parameter-efficient Prompt Learning for 3D Point Cloud Understanding

  • Hongyu Sun 0006
  • Yongcai Wang
  • Wang Chen
  • Haoran Deng
  • Deying Li 0001

This paper presents a parameter-efficient prompt tuning method, named PPT, to adapt a large multi-modal model for 3D point cloud understanding. Existing strategies are quite expensive in computation and storage, and depend on timeconsuming prompt engineering. We address the problems from three aspects. Firstly, a PromptLearner module is devised to replace hand-crafted prompts with learnable contexts to automate the prompt tuning process. Then, we lock the pre-trained backbone instead of adopting the full fine-tuning paradigm to substantially improve the parameter efficiency. Finally, a lightweight PointAdapter module is arranged near target tasks to enhance prompt tuning for 3D point cloud understanding. Comprehensive experiments are conducted to demonstrate the superior parameter and data efficiency of the proposed method. Meanwhile, we obtain new records on 4 public datasets and multiple 3D tasks, i. e. , point cloud recognition, few-shot learning, and part segmentation. The implementation is available at https://github.com/auniquesun/PPT.

NeurIPS Conference 2024 Conference Paper

Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis

  • Hongyu Sun
  • Qiuhong Ke
  • Yongcai Wang
  • Wang Chen
  • Kang Yang
  • Deying Li
  • Jianfei Cai

This paper investigates the 3D domain generalization (3DDG) ability of large 3D models based on prevalent prompt learning. Recent works demonstrate the performances of 3D point cloud recognition can be boosted remarkably by parameter-efficient prompt tuning. However, we observe that the improvement on downstream tasks comes at the expense of a severe drop in 3D domain generalization. To resolve this challenge, we present a comprehensive regulation framework that allows the learnable prompts to actively interact with the well-learned general knowledge in large 3D models to maintain good generalization. Specifically, the proposed framework imposes multiple explicit constraints on the prompt learning trajectory by maximizing the mutual agreement between task-specific predictions and task-agnostic knowledge. We design the regulation framework as a plug-and-play module to embed into existing representative large 3D models. Surprisingly, our method not only realizes consistently increasing generalization ability but also enhances task-specific 3D recognition performances across various 3DDG benchmarks by a clear margin. Considering the lack of study and evaluation on 3DDG, we also create three new benchmarks, namely base-to-new, cross-dataset and few-shot generalization benchmarks, to enrich the field and inspire future research. Code and benchmarks are available at \url{https: //github. com/auniquesun/Point-PRC}.

AAAI Conference 2023 Conference Paper

Global Mixup: Eliminating Ambiguity with Clustering

  • Xiangjin Xie
  • Li Yangning
  • Wang Chen
  • Kai Ouyang
  • Zuotong Xie
  • Hai-Tao Zheng

Data augmentation with Mixup has been proven an effective method to regularize the current deep neural networks. Mixup generates virtual samples and corresponding labels simultaneously by linear interpolation. However, the one-stage generation paradigm and the use of linear interpolation have two defects: (1) The label of the generated sample is simply combined from the labels of the original sample pairs without reasonable judgment, resulting in ambiguous labels. (2) Linear combination significantly restricts the sampling space for generating samples. To address these issues, we propose a novel and effective augmentation method, Global Mixup, based on global clustering relationships. Specifically, we transform the previous one-stage augmentation process into two-stage by decoupling the process of generating virtual samples from the labeling. And for the labels of the generated samples, relabeling is performed based on clustering by calculating the global relationships of the generated samples. Furthermore, we are no longer restricted to linear relationships, which allows us to generate more reliable virtual samples in a larger sampling space. Extensive experiments for CNN, LSTM, and BERT on five tasks show that Global Mixup outperforms previous baselines. Further experiments also demonstrate the advantage of Global Mixup in low-resource scenarios.

IJCAI Conference 2019 Conference Paper

Difficulty Controllable Generation of Reading Comprehension Questions

  • Yifan Gao
  • Lidong Bing
  • Wang Chen
  • Michael Lyu
  • Irwin King

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i. e. , answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.

AAAI Conference 2019 Conference Paper

Title-Guided Encoding for Keyphrase Generation

  • Wang Chen
  • Yifan Gao
  • Jiani Zhang
  • Irwin King
  • Michael R. Lyu

Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoderdecoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a titleguided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-ofthe-art models with a large margin, especially for documents with either very low or very high title length ratios.