Arrow Research search

Author name cluster

Mingyu Jin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

ICLR Conference 2025 Conference Paper

From Commands to Prompts: LLM-based Semantic File System for AIOS

  • Zeru Shi
  • Kai Mei
  • Mingyu Jin
  • Yongye Su
  • Chaoji Zuo
  • Wenyue Hua
  • Wujiang Xu
  • Yujie Ren

Large language models (LLMs) have demonstrated significant potential in the development of intelligent LLM-based agents. However, when users use these agent applications to perform file operations, their interaction with the file system still remains the traditional paradigm: reliant on manual navigation through precise commands. This paradigm poses a bottleneck to the usability of these systems as users are required to navigate complex folder hierarchies and remember cryptic file names. To address this limitation, we propose an LLM-based Semantic File System (LSFS) for prompt-driven file management in LLM Agent Operating System (AIOS). Unlike conventional approaches, LSFS incorporates LLMs to enable users or agents to interact with files through natural language prompts, facilitating semantic file management. At the macro-level, we develop a comprehensive API set to achieve semantic file management functionalities, such as semantic file retrieval, file update summarization, and semantic file rollback). At the micro-level, we store files by constructing semantic indexes for them, design and implement syscalls of different semantic operations, e.g., CRUD (create, read, update, delete), group by, join. Our experiments show that LSFS can achieve at least 15% retrieval accuracy improvement with 2.1× higher retrieval speed in the semantic file retrieval task compared with the traditional file system. In the traditional keyword-based file retrieval task (i.e., retrieving by string-matching), LSFS also performs stably well, i.e., over 89% F1-score with improved usability, especially when the keyword conditions become more complex. Additionally, LSFS supports more advanced file management operations, i.e., semantic file rollback and file sharing and achieves 100% success rates in these tasks, further suggesting the capability of LSFS . The code is available at https://github.com/agiresearch/AIOS-LSFS.

ICML Conference 2025 Conference Paper

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

  • Mingyu Jin
  • Kai Mei
  • Wujiang Xu
  • Mingjie Sun
  • Ruixiang Tang
  • Mengnan Du
  • Zirui Liu 0001
  • Yongfeng Zhang 0003

Large language models (LLMs) have achieved remarkable success in contextual knowledge understanding. In this paper, we show for the first time that these concentrated massive values consistently emerge in specific regions of attention queries (Q) and keys (K) while not having such patterns in values (V) in various modern transformer-based LLMs. Through extensive experiments, we further demonstrate that these massive values play a critical role in interpreting contextual knowledge (i. e. , knowledge obtained from the current context window) rather than in retrieving parametric knowledge stored within the model’s parameters. Our further investigation of quantization strategies reveals that ignoring these massive values leads to a pronounced drop in performance on tasks requiring rich contextual understanding, aligning with our analysis. Finally, we trace the emergence of concentrated massive values and find that such concentration is caused by Rotary Positional Encoding (RoPE) and it appears since very first layers. These findings shed new light on how Q and K operate in LLMs and offer practical insights for model design and optimization. The code is available at https: //github. com/MingyuJ666/Rope_with_LLM.

ICLR Conference 2025 Conference Paper

Visual Agents as Fast and Slow Thinkers

  • Guangyan Sun
  • Mingyu Jin
  • Zhenting Wang
  • Cheng-Long Wang
  • Siqi Ma
  • Qifan Wang 0001
  • Tong Geng
  • Ying Nian Wu

Achieving human-level intelligence requires refining cognitive distinctions between \textit{System 1} and \textit{System 2} thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident responses. To address the challenge, we introduce \textbf{\textsc{FaST}}, which incorporates the \textbf{Fa}st and \textbf{S}low \textbf{T}hinking mechanism into visual agents. \textsc{FaST} employs a switch adapter to dynamically select between \textit{System 1/2} modes, tailoring the problem-solving approach to different task complexity. It tackles uncertain and unseen objects by adjusting model confidence and integrating new contextual data. With this novel design, we advocate a \textit{flexible system}, \textit{hierarchical reasoning} capabilities, and a \textit{transparent decision-making} pipeline, all of which contribute to its ability to emulate human-like cognitive processes in visual intelligence. Empirical results demonstrate that \textsc{FaST} outperforms various well-known baselines, achieving 80.8\% accuracy over $VQA^{v2}$ for visual question answering and 48.7\% $GIoU$ score over ReasonSeg for reasoning segmentation, demonstrate \textsc{FaST}'s superior performance. Extensive testing validates the efficacy and robustness of \textsc{FaST}'s core components, showcasing its potential to advance the development of cognitive visual agents in AI systems.

AAAI Conference 2024 Conference Paper

MathAttack: Attacking Large Language Models towards Math Solving Ability

  • Zihao Zhou
  • Qiufeng Wang
  • Mingyu Jin
  • Jie Yao
  • Jianan Ye
  • Wei Liu
  • Wei Wang
  • Xiaowei Huang

With the boom of Large Language Models (LLMs), the research of solving Math Word Problem (MWP) has recently made great progress. However, there are few studies to examine the robustness of LLMs in math solving ability. Instead of attacking prompts in the use of LLMs, we propose a MathAttack model to attack MWP samples which are closer to the essence of robustness in solving math problems. Compared to traditional text adversarial attack, it is essential to preserve the mathematical logic of original MWPs during the attacking. To this end, we propose logical entity recognition to identify logical entries which are then frozen. Subsequently, the remaining text are attacked by adopting a word-level attacker. Furthermore, we propose a new dataset RobustMath to evaluate the robustness of LLMs in math solving ability. Extensive experiments on our RobustMath and two another math benchmark datasets GSM8K and MultiAirth show that MathAttack could effectively attack the math solving ability of LLMs. In the experiments, we observe that (1) Our adversarial samples from higher-accuracy LLMs are also effective for attacking LLMs with lower accuracy (e.g., transfer from larger to smaller-size LLMs, or from few-shot to zero-shot prompts); (2) Complex MWPs (such as more solving steps, longer text, more numbers) are more vulnerable to attack; (3) We can improve the robustness of LLMs by using our adversarial samples in few-shot prompts. Finally, we hope our practice and observation can serve as an important attempt towards enhancing the robustness of LLMs in math solving ability. The code and dataset is available at: https://github.com/zhouzihao501/MathAttack.

ECAI Conference 2024 Conference Paper

Target-driven Attack for Large Language Models

  • Chong Zhang 0006
  • Mingyu Jin
  • Dong Shu
  • Taowen Wang
  • Dongfang Liu
  • Xiaobo Jin

Current large language models (LLM) provide a strong foundation for large-scale user-oriented natural language tasks. Many users can easily inject adversarial text or instructions through the user interface, thus causing LLM model security challenges like the language model not giving the correct answer. Although there is currently a large amount of research on black-box attacks, most of these black-box attacks use random and heuristic strategies. It is unclear how these strategies relate to the success rate of attacks and thus effectively improve model robustness. To solve this problem, we propose our target-driven black-box attack method to maximize the KL divergence between the conditional probabilities of the clean text and the attack text to redefine the attack’s goal. We transform the distance maximization problem into two convex optimization problems based on the attack goal to solve the attack text and estimate the covariance. Furthermore, the projected gradient descent algorithm solves the vector corresponding to the attack text. Our target-driven black-box attack approach includes two attack strategies: token manipulation and misinformation attack. Experimental results on multiple Large Language Models and datasets demonstrate the effectiveness of our attack method.