Arrow Research search

Author name cluster

Yuqi Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing

  • Liangyu Chen
  • Yichen Xu
  • Jianzhe Ma
  • Yuqi Liu
  • Donglu Yang
  • Liang Zhang
  • Zihao Yue
  • Wenxuan Wang

Chart editing reduces manual effort in visualization design. Typical benchmarks assume access to complete chart code, which is unrealistic for real-world applications. In this paper, we present ChartEditVista, a comprehensive benchmark consisting of 7,964 samples spanning 31 chart categories. It encompasses diverse editing instruction types and covers nearly all editable chart elements. The inputs in ChartEditVista include only the original chart image and natural language editing instructions, without the original chart codes. ChartEditVista is generated through a fully automated pipeline that produces, edits, and verifies charts, ensuring high-quality data. Besides, we introduce two novel fine-grained, rule-based evaluation metrics: the layout metric, which evaluates the position, size; and color of graphical components, and the text metric, which jointly assesses textual content and font styling. Building on top of ChartEditVista, we present ChartEditor, a model trained using a reinforcement learning framework that incorporates a novel rendering reward to simultaneously enforce code executability and visual fidelity. Through extensive experiments and human evaluations, we demonstrate that ChartEditVista provides a robust evaluation, while ChartEditor consistently outperforms models with similar-scale and larger-scale on chart editing tasks.

ICML Conference 2025 Conference Paper

Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection

  • Zhipeng Wei 0001
  • Yuqi Liu
  • N. Benjamin Erichson

Jailbreaking techniques trick Large Language Models (LLMs) into producing restricted output, posing a potential threat. One line of defense is to use another LLM as a Judge to evaluate the harmfulness of generated text. However, we reveal that these Judge LLMs are vulnerable to token segmentation bias, an issue that arises when delimiters alter the tokenization process, splitting words into smaller sub-tokens. This alters the embeddings of the entire sequence, reducing detection accuracy and allowing harmful content to be misclassified as safe. In this paper, we introduce Emoji Attack, a novel strategy that amplifies existing jailbreak prompts by exploiting token segmentation bias. Our method leverages in-context learning to systematically insert emojis into text before it is evaluated by a Judge LLM, inducing embedding distortions that significantly lower the likelihood of detecting unsafe content. Unlike traditional delimiters, emojis also introduce semantic ambiguity, making them particularly effective in this attack. Through experiments on state-of-the-art Judge LLMs, we demonstrate that Emoji Attack substantially reduces the unsafe prediction rate, bypassing existing safeguards.

AAAI Conference 2025 Conference Paper

Enhancing LLMs via High-Knowledge Data Selection

  • Feiyu Duan
  • Xuemiao Zhang
  • Sirui Wang
  • Haoran Que
  • Yuqi Liu
  • Wenge Rong
  • Xunliang Cai

The performance of Large Language Models (LLMs) is intrinsically linked to the quality of its training data. Although several studies have proposed methods for high-quality data selection, they do not consider the importance of knowledge richness in text corpora. In this paper, we propose a novel and gradient-free High-Knowledge Scorer (HKS) to select high-quality data from the dimension of knowledge, to alleviate the problem of knowledge scarcity in the pre-trained corpus. We propose a comprehensive multi-domain knowledge element pool and introduce knowledge density and coverage as metrics to assess the knowledge content of the text. Based on this, we propose a comprehensive knowledge scorer to select data with intensive knowledge, which can also be utilized for domain-specific high-knowledge data selection by restricting knowledge elements to the specific domain. We train models on a high-knowledge bilingual dataset, and experimental results demonstrate that our scorer improves the model's performance in knowledge-intensive and general comprehension tasks, and is effective in enhancing both the generic and domain-specific capabilities of the model.

YNIMG Journal 2025 Journal Article

Reweighting of visuomotor areas during motor processing subsequent to somatosensory cortical damage

  • Yuqi Liu
  • Elizabeth J. Halfen
  • Jeffrey M. Yau
  • Simon Fischer-Baum
  • Peter J. Kohler
  • Olufunsho Faseyitan
  • H. Branch Coslett
  • Jared Medina

Somatosensory inputs are critical to motor control. Animal studies have shown that primary somatosensory lesions cause sensorimotor deficits along with disrupted organization in primary motor cortex (M1). How does damage in primary somatosensory cortex (S1) influence motor networks in humans? Using fMRI, we examined two individuals, LS and RF, who had extensive damage to left somatosensory cortex, but primarily intact motor cortex and preserved motor abilities. Given left S1 damage, tactile detection and localization were impaired for the contralesional hand in both individuals. When moving the contralesional hand, LS, with near complete damage to S1 hand area, showed increased activation in ipsilesional putamen and deactivation in contralesional cerebellum relative to age-matched controls. These findings demonstrate influences of S1 damage to subcortical sensorimotor areas that are distant from the lesion site, and a potential reweighting of the motor network with increased action selection in putamen and inhibition of sensory prediction in cerebellum in the face of sensory loss. In contrast, RF, who had a small island of spared S1 in the hand area, showed greater activation in contralesional S1 for movement versus rest. This same region was also activated by pure somatosensory stimulation in a second experiment, suggesting that the spared S1 area in RF still subserves sensorimotor processing. Finally, the right middle occipital gyrus was more strongly activated in both individuals compared with controls, suggesting the potential reliance on visual imagery in the face of degraded sensory feedback.

AAAI Conference 2025 Conference Paper

SCOPE: Sign Language Contextual Processing with Embedding from LLMs

  • Yuqi Liu
  • Wenqian Zhang
  • Sihan Ren
  • Chengyu Huang
  • Jingyi Yu
  • Lan Xu

Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language COntextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. Both our dataset and code will be open-sourced to facilitate further research.

AAAI Conference 2024 Conference Paper

Toward Open-Set Human Object Interaction Detection

  • Mingrui Wu
  • Yuqi Liu
  • Jiayi Ji
  • Xiaoshuai Sun
  • Rongrong Ji

This work is oriented toward the task of open-set Human Object Interaction (HOI) detection. The challenge lies in identifying completely new, out-of-domain relationships, as opposed to in-domain ones which have seen improvements in zero-shot HOI detection. To address this challenge, we introduce a simple Disentangled HOI Detection (DHD) model for detecting novel relationships by integrating an open-set object detector with a Visual Language Model (VLM). We utilize a disentangled image-text contrastive learning metric for training and connect the bottom-up visual features to text embeddings through lightweight unary and pair-wise adapters. Our model can benefit from the open-set object detector and the VLM to detect novel action categories and combine actions with novel object categories. We further present the VG-HOI dataset, a comprehensive benchmark with over 17k HOI relationships for open-set scenarios. Experimental results show that our model can detect unknown action classes and combine unknown object classes. Furthermore, it can generalize to over 17k HOI classes while being trained on just 600 HOI classes.

AAAI Conference 2023 Conference Paper

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language

  • Yuqi Liu
  • Luhui Xu
  • Pengfei Xiong
  • Qin Jin

Applying large scale pre-trained image-language model to video-language tasks has recently become a trend, which brings two challenges. One is how to effectively transfer knowledge from static images to dynamic videos, and the other is how to deal with the prohibitive cost of fully fine-tuning due to growing model size. Existing works that attempt to realize parameter-efficient image-language to video-language transfer learning can be categorized into two types: 1) appending a sequence of temporal transformer blocks after the 2D Vision Transformer (ViT), and 2) inserting a temporal block into the ViT architecture. While these two types of methods only require fine-tuning the newly added components, there are still many parameters to update, and they are only validated on a single video-language task. In this work, based on our analysis of the core ideas of different temporal modeling components in existing approaches, we propose a token mixing strategy to enable cross-frame interactions, which enables transferring from the pre-trained image-language model to video-language tasks through selecting and mixing a key set and a value set from the input video samples. As token mixing does not require the addition of any components or modules, we can directly partially fine-tune the pre-trained image-language model to achieve parameter-efficiency. We carry out extensive experiments to compare our proposed token mixing method with other parameter-efficient transfer learning methods. Our token mixing method outperforms other methods on both understanding tasks and generation tasks. Besides, our method achieves new records on multiple video-language tasks. The code is available at https://github.com/yuqi657/video_language_model.