Arrow Research search

Author name cluster

Qi He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

TMLR Journal 2026 Journal Article

DiffKGW: Stealthy and Robust Diffusion Model Watermarking

  • Tianxin Wei
  • Ruizhong Qiu
  • Yifan Chen
  • Yunzhe Qi
  • Jiacheng Lin
  • Wenxuan Bao
  • Wenju Xu
  • Sreyashi Nag

Diffusion models are known for their supreme capability to generate realistic images. However, ethical concerns, such as copyright protection and the generation of inappropriate content, pose significant challenges for the practical deployment of diffusion models. Recent work has proposed a flurry of watermarking techniques that inject artificial patterns into initial latent representations of diffusion models, offering a promising solution to these issues. However, enforcing a specific pattern on selected elements can disrupt the Gaussian distribution of the initial latent representation. Inspired by watermarks for large language models (LLMs), we generalize the LLM KGW watermark to image diffusion models and propose a stealthy probability adjustment approach DiffKGW that preserves the Gaussian distribution of initial latent representation. In addition, we dissect the design principles of state-of-the-art watermarking techniques and introduce a unified framework. We identify a set of dimensions that explain the manipulation enforced by watermarking methods, including the distribution of individual elements, the specification of watermark shapes within each channel, and the choice of channels for watermark embedding. Through the empirical studies on regular text-to-image applications and the first systematic attempt at watermarking image-to-image diffusion models, we thoroughly verify the effectiveness of our proposed framework through comprehensive evaluations. On all the diffusion models, including Stable Diffusion, our approach induced from the proposed framework not only preserves image quality but also outperforms existing methods in robustness against a wide range of attacks.

AAAI Conference 2026 Conference Paper

Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models

  • Yu Fu
  • Haz Sameen Shahgir
  • Hui Liu
  • Xianfeng Tang
  • Qi He
  • Yue Dong

Recent advances in long-context language models (LCLMs), designed to handle extremely long contexts, primarily focus on utilizing external contextual information, often leaving the influence of language models' parametric knowledge underexplored. In this work, we firstly investigate how this parametric knowledge affects content generation and demonstrate that its impact becomes increasingly pronounced as context length extends. Furthermore, we show that the model’s ability to utilize parametric knowledge, which we call parametric recall ability, does not improve simultaneously with its ability to leverage contextual knowledge through extrinsic retrieval ability. Moreover, better extrinsic retrieval ability can interfere with the model’s parametric recall ability, limiting its full potential. To bridge this gap, we design a simple yet effective Hybrid Needle-in-a-Haystack test that evaluates models based on their capabilities across both abilities, rather than solely emphasizing extrinsic retrieval ability. Our experimental results reveal that Qwen-2.5 models significantly outperform Llama-3.1 models, demonstrating superior potential to combine various abilities. Moreover, even the more powerful Llama-3.1-70B-Instruct model fails to exhibit better performance, highlighting the importance of evaluating models from a dual-ability perspective.

TIST Journal 2025 Journal Article

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

  • Fali Wang
  • Zhiwei Zhang
  • Xianren Zhang
  • Zongyu Wu
  • TzuHao Mo
  • Qiuhao Lu
  • Wanjing Wang
  • Rui Li

Large language models (LLMs) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like PaLM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use, which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain knowledge acquisition, addressing LLMs’ challenges and proving ideal for applications that require localized data handling for privacy, minimal inference latency for efficiency, and domain knowledge acquisition through lightweight fine-tuning. The rising demand for SLMs has spurred extensive research and development. However, a comprehensive survey investigating issues related to the definition, acquisition, application, enhancement, and reliability of SLM remains lacking, prompting us to conduct a detailed survey on these topics. The definition of SLMs varies widely; thus, to standardize, we propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings, setting boundaries based on the minimal size for emergent abilities and the maximum size sustainable under resource constraints. For other aspects, we provide a taxonomy of relevant models/methods and develop general frameworks for each category to enhance and utilize SLMs effectively. We have compiled the collected SLM models and related methods on GitHub: https://github.com/FairyFali/SLMs-Survey.

NeurIPS Conference 2025 Conference Paper

AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

  • Fali Wang
  • Hui Liu
  • Zhenwei Dai
  • Jingying Zeng
  • Zhiwei Zhang
  • Zongyu Wu
  • Chen Luo
  • Zhen Li

Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks, composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks, aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks. Informed by these insights, we propose AgentTTS, an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.

NeurIPS Conference 2025 Conference Paper

Bilevel ZOFO: Efficient LLM Fine-Tuning and Meta-Training

  • Reza Shirkavand
  • Peiran Yu
  • Qi He
  • Heng Huang

Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO) optimizers presents significant computational challenges. Parameter-Efficient Fine-Tuning~(PEFT) methods have been proposed to address these challenges by freezing most model parameters and training only a small subset. While PEFT is efficient, it may not outperform full fine-tuning when high task-specific performance is required. Zeroth-Order (ZO) methods offer an alternative for fine-tuning the entire pre-trained model by approximating gradients using only the forward pass, thus eliminating the computational burden of back-propagation, % in first-order methods, but they converge painfully slowly and are very sensitive to the choice of task prompts. We bridge these worlds with Bilevel‑ZOFO, a penalty‑based bilevel formulation that treats adapter parameters as a lower‑level learner coupled to an upper‑level ZO optimizer of the full backbone. This double-loop optimization strategy only requires the gradient of the PEFT model and the forward pass of the base model. We provide theoretical convergence guarantees for Bilevel ZOFO. Empirically, we demonstrate that Bilevel-ZOFO significantly outperforms existing ZO methods, achieves 2–4$\times$ faster training, and reduces sensitivity to prompts. Bilevel-ZOFO also outperforms FO PEFT methods while maintaining similar memory efficiency. Additionally, we show its strong potential for meta learning.

NeurIPS Conference 2025 Conference Paper

Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy

  • Jie Ren
  • Zhenwei Dai
  • Xianfeng Tang
  • Yue Xing
  • Shenglai Zeng
  • Jingying Zeng
  • Qiankun Peng
  • Samarth Varshney

Although Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks, growing concerns have emerged over the misuse of sensitive, copyrighted, or harmful data during training. To address these concerns, unlearning techniques have been developed to remove the influence of specific data without retraining from scratch. However, this paper reveals a critical vulnerability in fine-tuning-based unlearning: a malicious user can craft a manipulated forgetting request that stealthily degrades the model’s utility for benign users. We demonstrate this risk through a red-teaming Stealthy Attack (SA), which is inspired by two key limitations of existing unlearning—the inability to constrain the scope of unlearning effect and the failure to distinguish benign tokens from unlearning signals. Prior work has shown that unlearned models tend to memorize forgetting data as unlearning signals, and respond with hallucinations or feigned ignorance when unlearning signals appear in the input. By subtly increasing the presence of common benign tokens in the forgetting data, SA enhances the connection between benign tokens and unlearning signals. As a result, when normal users include such tokens in their prompts, the model exhibits unlearning behaviors, leading to unintended utility degradation. To address this vulnerability, we propose Scope-aware Unlearning (SU), a lightweight enhancement that introduces a scope term into the unlearning objective, encouraging the model to localize the forgetting effect. Our method requires no additional data processing, integrates seamlessly with existing fine-tuning frameworks, and significantly improves robustness against SA. Extensive experiments validate the effectiveness of both SA and SU.

ICML Conference 2025 Conference Paper

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness

  • Qi He
  • Peiran Yu
  • Ziyi Chen 0002
  • Heng Huang 0001

Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance. Despite extensive development of convergence guarantees under various assumptions in recent years, most require the Lipschitz smoothness condition, which is often not met in common machine learning models. We highlight this issue with specific counterexamples. To address this gap, we revisit the convergence rates of shuffling-type gradient methods without assuming Lipschitz smoothness. Using our stepsize strategy, the shuffling-type gradient algorithm not only converges under weaker assumptions but also match the current best-known convergence rates, thereby broadening its applicability. We prove the convergence rates for nonconvex, strongly convex, and non-strongly convex cases, each under both random reshuffling and arbitrary shuffling schemes, under a general bounded variance condition. Numerical experiments further validate the performance of our shuffling-type gradient algorithm, underscoring its practical efficacy.

NeurIPS Conference 2025 Conference Paper

ToolRL: Reward is All Tool Learning Needs

  • Cheng Qian
  • Emre Can Acikgoz
  • Qi He
  • Hongru WANG
  • Xiusi Chen
  • Dilek Hakkani-Tur
  • Gokhan Tur
  • Heng Ji

Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement learning (RL), particularly with R1-like models, have demonstrated promising reasoning and generalization abilities. Yet, reward design for tool use presents unique challenges: multiple tools may be invoked with diverse parameters, and coarse-grained reward signals, such as answer matching, fail to offer the finegrained feedback required for effective learning. In this work, we present the first comprehensive study on reward design for tool selection and application tasks within the RL paradigm. We systematically explore a wide range of reward strategies, analyzing their types, scales, granularity, and temporal dynamics. Building on these insights, we propose a principled reward design tailored for tool use tasks and apply it to train LLMs using RL methods. Empirical evaluations across diverse benchmarks demonstrate that our approach yields robust, scalable, and stable training, achieving a 17\% improvement over base models and a 15\% gain over SFT models. These results highlight the critical role of thoughtful reward design in enhancing the tool use capabilities and generalization performance of LLMs. All the codes are released to facilitate future research.

TCS Journal 2025 Journal Article

Vertex-independent spanning trees in complete Josephus cubes

  • Qi He
  • Yan Wang
  • Jianxi Fan
  • Baolei Cheng

Vertex-independent spanning trees (short for VISTs) serve as pivotal constructs in numerous network algorithms and have been the subject of extensive research for three decades. The n-dimensional complete Josephus cube C J C n, derived from the Josephus cube, was first proposed to achieve better fault tolerance while maximizing routing efficiency (no sacrificing routing efficiency). Compared to the Josephus cube, it exhibits enhanced symmetry, improved connectivity, and better fault tolerance while maintaining efficient embedding, incremental scalability, and short diameter ( ⌈ n 2 ⌉ ). This paper studies the existence and construction of n + 2 VISTs in C J C n rooted at an arbitrary vertex. To determine the specific connection edge between vertex v and its parent in the spanning tree T i, three algorithms were first proposed to calculate the values of F v, i, M v, i, and H v, i, respectively, where v ∈ V ( C J C n ) and i ∈ { 0, 1, ⋯, n + 1 }. Based on these algorithms, a parallel algorithm is proposed to construct n + 2 ( n ≥ 4 ) VISTs in C J C n using 2 n processors. As C J C n is ( n + 2 ) -connected, our algorithm is designed to yield the optimal number of resulting VISTs for n ≥ 4. Finally, we present the theoretical proof of the parallel algorithm and demonstrate that its time complexity is O ( n ).

JBHI Journal 2024 Journal Article

MonoLoT: Self-Supervised Monocular Depth Estimation in Low-Texture Scenes for Automatic Robotic Endoscopy

  • Qi He
  • Guang Feng
  • Sophia Bano
  • Danail Stoyanov
  • Siyang Zuo

The self-supervised monocular depth estimation framework is well-suited for medical images that lack ground-truth depth, such as those from digestive endoscopes, facilitating navigation and 3D reconstruction in the gastrointestinal tract. However, this framework faces several limitations, including poor performance in low-texture environments, limited generalisation to real-world datasets, and unclear applicability in downstream tasks like visual servoing. To tackle these challenges, we propose MonoLoT, a self-supervised monocular depth estimation framework featuring two key innovations: point matching loss and batch image shuffle. Extensive ablation studies on two publicly available datasets, namely C3VD and SimCol, have shown that methods enabled by MonoLoT achieve substantial improvements, with accuracies of 0. 944 on C3VD and 0. 959 on SimCol, surpassing both depth-supervised and self-supervised baselines on C3VD. Qualitative evaluations on real-world endoscopic data underscore the generalisation capabilities of our methods, outperforming both depth-supervised and self-supervised baselines. To demonstrate the feasibility of using monocular depth estimation for visual servoing, we have successfully integrated our method into a proof-of-concept robotic platform, enabling real-time automatic intervention and control in digestive endoscopy. In summary, our method represents a significant advancement in monocular depth estimation for digestive endoscopy, overcoming key challenges and opening promising avenues for medical applications.