Arrow Research search

Author name cluster

Haotian Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

TMLR Journal 2026 Journal Article

Graph Concept Bottleneck Models

  • Haotian Xu
  • Tsui-Wei Weng
  • Lam M. Nguyen
  • Tengfei Ma

Concept Bottleneck Models (CBMs) have emerged as a prominent framework for interpretable deep learning, providing human-understandable intermediate concepts that enable transparent reasoning and direct intervention. However, existing CBMs typically assume conditional independence among concepts given the label, overlooking the intrinsic dependencies and correlations that often exist among them. In practice, concepts are rarely isolated: modifying one concept may inherently influence others. Ignoring these relationships can lead to oversimplified representations and weaken interpretability. To address this limitation, we introduce **Graph CBMs**, a novel variant of CBMs that explicitly models the relational structure among concepts through a latent concept graph. Our approach can be seamlessly integrated into existing CBMs as a lightweight, plug-and-play module, enriching their reasoning capability without sacrificing interpretability. Experimental results on multiple real-world image classification benchmarks demonstrate that Graph CBMs (1) achieve higher predictive accuracy while revealing meaningful concept structures, (2) enable more effective and robust concept-level interventions, and (3) maintain stable performance across diverse architectures and training setups.

AAAI Conference 2026 Conference Paper

LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

  • Junlong Jia
  • Xing Wu
  • Chaochen Gao
  • Ziyang Chen
  • Zijia Lin
  • Zhongzhi Li
  • Weinong Wang
  • Haotian Xu

High-quality long-context data is essential for training large language models (LLMs) capable of processing extensive documents, yet existing synthesis approaches using relevance-based aggregation face challenges of computational efficiency. We present LiteLong, a resource-efficient method for synthesizing long-context data through structured topic organization and multi-agent debate. Our approach leverages the BISAC book classification system to provide a comprehensive hierarchical topic organization, and then employs a debate mechanism with multiple LLMs to generate diverse, high-quality topics within this structure. For each topic, we use lightweight BM25 retrieval to obtain relevant documents and concatenate them into 128K-token training samples. Experiments on HELMET and Ruler benchmarks demonstrate that LiteLong achieves competitive long-context performance and can seamlessly integrate with other long-dependency enhancement methods. LiteLong makes high-quality long-context data synthesis more accessible by reducing both computational and data engineering costs, facilitating further research in long-context language training.

NeurIPS Conference 2025 Conference Paper

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

  • Xinji Mai
  • Haotian Xu
  • Xing W
  • Weinong Wang
  • Yingying Zhang
  • Wenqiang Zhang

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. We implement a robust framework featuring a decoupled code execution environment and validate our findings across standard RL algorithms and frameworks. Experiments show ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks. Our findings provide a foundational understanding of how autonomous tool use is acquired and scales within Agent RL, offering a reproducible benchmark for future studies. Code is released at \href{https: //github. com/yyht/openrlhf async pipline}{https: //github. com/yyht/openrlhf_async_pipline}.

NeurIPS Conference 2025 Conference Paper

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

  • Yi Hu
  • Shijia Kang
  • Haotong Yang
  • Haotian Xu
  • Muhan Zhang

Length generalization—the ability to solve problems longer than those seen during training—remains a critical challenge for large language models (LLMs). Previous work modifies positional encodings (PEs) and data formats to improve length generalization on specific symbolic tasks such as addition and sorting. However, these approaches are fundamentally limited to special tasks, often degrading general language performance. Furthermore, they are typically evaluated on small transformers trained from scratch on single tasks and can cause performance drop when applied during post-training stage of practical LLMs with general capabilities. Hu et al. , (2024) proposed Rule-Following Fine-Tuning (RFFT) to improve length generalization in the post-training stage of LLMs. Despite its compatibility with practical models and strong performance, RFFT is proposed for single tasks too, requiring re-training for each individual task with extensive examples. In this paper, we study length generalization in multi-task settings and propose Meta Rule-Following Fine-Tuning (Meta-RFFT), the first framework enabling robust cross-task length generalization. As our first contribution, we construct a large length generalization dataset containing 86 tasks spanning code execution, number processing, symbolic and logical reasoning tasks, beyond the common addition or multiplication tasks. Secondly, we show that cross-task length generalization is possible with Meta-RFFT—after training on a large number of tasks and instances, the models achieve remarkable length generalization ability on unseen tasks with minimal fine-tuning or one-shot prompting. For example, after fine-tuning on 1 to 5 digit addition, our 32B model achieves 95% accuracy on 30 digit addition, significantly outperforming the state-of-the-art reasoning models (DeepSeek-R1-671B: 72%; QwQ-32B: 32%), despite never seeing this task during RF-pretraining.

IJCAI Conference 2025 Conference Paper

DriftRemover: Hybrid Energy Optimizations for Anomaly Images Synthesis and Segmentation

  • Siyue Yao
  • Haotian Xu
  • Mingjie Sun
  • Siyue Yu
  • Jimin Xiao
  • Eng Gee Lim

This paper tackles the challenge of anomaly image synthesis and segmentation to generate various anomaly images and their segmentation labels to mitigate the issue of data scarcity. Existing approaches employ the precise mask to guide the generation, relying on additional mask generators, leading to increased computational costs and limited anomaly diversity. Although a few works use coarse masks as the guidance to expand diversity, they lack effective generation of labels for synthetic images, thereby reducing their practicality. Therefore, our proposed method simultaneously generates anomaly images and their corresponding masks by utilizing coarse masks and anomaly categories. The framework utilizes attention maps from synthesis process as mask labels and employs two optimization modules to tackle drift challenges, which are mismatches between synthetic results and real situations. Our evaluation demonstrates that our method improves pixel-level AP by 1. 3% and F1-MAX by 1. 8% in anomaly detection tasks on the MVTec dataset. Additionally, its successful application in practical scenarios highlights its effectiveness, improving IoU by 37. 2% and F-measure by 25. 1% with the Floor Dirt dataset. The code is available at https: //github. com/JJessicaYao/DriftRemover.

IROS Conference 2025 Conference Paper

JPDS-NN: Reinforcement Learning-Based Dynamic Task Allocation for Agricultural Vehicle Routing Optimization

  • Yixuan Fan
  • Haotian Xu
  • Mengqiao Liu
  • Qing Zhuo
  • Tao Zhang

The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48. 4–65. 4%, lowers fuel consumption by 14. 0–17. 6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15–25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.

IROS Conference 2025 Conference Paper

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

  • Jun Zhu
  • Zihao Du
  • Haotian Xu
  • Fengbo Lan
  • Zilong Zheng
  • Bo Ma
  • Shengjie Wang
  • Tao Zhang

Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot’s pose. However, the robot’s orientation is crucial for successfully completing tasks because of how objects are arranged (e. g. , to open a refrigerator door). Humans intuitively navigate to objects with the right orientation using semantics and common sense. For instance, when opening a refrigerator, we naturally stand in front of it rather than to the side. Recent advances suggest that Vision-Language Models (VLMs) can provide robots with similar common sense. Therefore, we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches by precisely determining the optimal orientation relative to target objects, resulting in a 68. 8% reduction in Distance to Goal (DTG). Real-world video demonstrations can be found on the supplementary website 1.

NeurIPS Conference 2025 Conference Paper

SilentStriker: Toward Stealthy Bit-Flip Attacks on Large Language Models

  • Haotian Xu
  • Qingsong Peng
  • Jie Shi
  • Huadi Zheng
  • Yu Li
  • Cheng Zhuo

The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e. g. , prompt injection) have been well-studied, Bit-Flip Attacks (BFAs)—which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation—have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which in-evidently degrade the output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.

ICLR Conference 2025 Conference Paper

ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning

  • Zihan Ye
  • Shreyank N. Gowda
  • Shiming Chen 0002
  • Xiaowei Huang 0001
  • Haotian Xu
  • Fahad Shahbaz Khan
  • Yaochu Jin
  • Kaizhu Huang

Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.

IJCAI Conference 2024 Conference Paper

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

  • Haotian Xu
  • Zhaorui Zhang
  • Sheng Di
  • Benben Liu
  • Khalid Ayed Alharthi
  • Jiannong Cao

Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.

NeurIPS Conference 2023 Conference Paper

Change point detection and inference in multivariate non-parametric models under mixing conditions

  • Carlos Misael Madrid Padilla
  • Haotian Xu
  • Daren Wang
  • OSCAR HERNAN MADRID PADILLA
  • Yi Yu

This paper addresses the problem of localizing and inferring multiple change points, in non-parametric multivariate time series settings. Specifically, we consider a multivariate time series with potentially short-range dependence, whose underlying distributions have Hölder smooth densities and can change over time in a piecewise-constant manner. The change points, which correspond to the times when the distribution changes, are unknown. We present the limiting distributions of the change point estimators under the scenarios where the minimal jump size vanishes or remains constant. Such results have not been revealed in the literature in non-parametric change point settings. As byproducts, we develop a sharp estimator that can accurately localize the change points in multivariate non-parametric time series, and a consistent block-type long-run variance estimator. Numerical studies are provided to complement our theoretical findings.

IROS Conference 2023 Conference Paper

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

  • Haotian Xu
  • Shengjie Wang
  • Zhaolei Wang
  • Yunzhe Zhang
  • Qing Zhuo
  • Yang Gao 0029
  • Tao Zhang

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satis-faction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety bud-get) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.