Author name cluster

Haotian Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

TMLR Journal 2026 Journal Article

Graph Concept Bottleneck Models

Haotian Xu
Tsui-Wei Weng
Lam M. Nguyen
Tengfei Ma

Concept Bottleneck Models (CBMs) have emerged as a prominent framework for interpretable deep learning, providing human-understandable intermediate concepts that enable transparent reasoning and direct intervention. However, existing CBMs typically assume conditional independence among concepts given the label, overlooking the intrinsic dependencies and correlations that often exist among them. In practice, concepts are rarely isolated: modifying one concept may inherently influence others. Ignoring these relationships can lead to oversimplified representations and weaken interpretability. To address this limitation, we introduce **Graph CBMs**, a novel variant of CBMs that explicitly models the relational structure among concepts through a latent concept graph. Our approach can be seamlessly integrated into existing CBMs as a lightweight, plug-and-play module, enriching their reasoning capability without sacrificing interpretability. Experimental results on multiple real-world image classification benchmarks demonstrate that Graph CBMs (1) achieve higher predictive accuracy while revealing meaningful concept structures, (2) enable more effective and robust concept-level interventions, and (3) maintain stable performance across diverse architectures and training setups.

PDF Details

AAAI Conference 2026 Conference Paper

LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

Junlong Jia
Xing Wu
Chaochen Gao
Ziyang Chen
Zijia Lin
Zhongzhi Li
Weinong Wang
Haotian Xu

High-quality long-context data is essential for training large language models (LLMs) capable of processing extensive documents, yet existing synthesis approaches using relevance-based aggregation face challenges of computational efficiency. We present LiteLong, a resource-efficient method for synthesizing long-context data through structured topic organization and multi-agent debate. Our approach leverages the BISAC book classification system to provide a comprehensive hierarchical topic organization, and then employs a debate mechanism with multiple LLMs to generate diverse, high-quality topics within this structure. For each topic, we use lightweight BM25 retrieval to obtain relevant documents and concatenate them into 128K-token training samples. Experiments on HELMET and Ruler benchmarks demonstrate that LiteLong achieves competitive long-context performance and can seamlessly integrate with other long-dependency enhancement methods. LiteLong makes high-quality long-context data synthesis more accessible by reducing both computational and data engineering costs, facilitating further research in long-context language training.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai
Haotian Xu
Xing W
Weinong Wang
Yingying Zhang
Wenqiang Zhang

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. We implement a robust framework featuring a decoupled code execution environment and validate our findings across standard RL algorithms and frameworks. Experiments show ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks. Our findings provide a foundational understanding of how autonomous tool use is acquired and scales within Agent RL, offering a reproducible benchmark for future studies. Code is released at \href{https: //github. com/yyht/openrlhf async pipline}{https: //github. com/yyht/openrlhf_async_pipline}.

PDF Details

NeurIPS Conference 2025 Conference Paper

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

Yi Hu
Shijia Kang
Haotong Yang
Haotian Xu
Muhan Zhang

Length generalization—the ability to solve problems longer than those seen during training—remains a critical challenge for large language models (LLMs). Previous work modifies positional encodings (PEs) and data formats to improve length generalization on specific symbolic tasks such as addition and sorting. However, these approaches are fundamentally limited to special tasks, often degrading general language performance. Furthermore, they are typically evaluated on small transformers trained from scratch on single tasks and can cause performance drop when applied during post-training stage of practical LLMs with general capabilities. Hu et al. , (2024) proposed Rule-Following Fine-Tuning (RFFT) to improve length generalization in the post-training stage of LLMs. Despite its compatibility with practical models and strong performance, RFFT is proposed for single tasks too, requiring re-training for each individual task with extensive examples. In this paper, we study length generalization in multi-task settings and propose Meta Rule-Following Fine-Tuning (Meta-RFFT), the first framework enabling robust cross-task length generalization. As our first contribution, we construct a large length generalization dataset containing 86 tasks spanning code execution, number processing, symbolic and logical reasoning tasks, beyond the common addition or multiplication tasks. Secondly, we show that cross-task length generalization is possible with Meta-RFFT—after training on a large number of tasks and instances, the models achieve remarkable length generalization ability on unseen tasks with minimal fine-tuning or one-shot prompting. For example, after fine-tuning on 1 to 5 digit addition, our 32B model achieves 95% accuracy on 30 digit addition, significantly outperforming the state-of-the-art reasoning models (DeepSeek-R1-671B: 72%; QwQ-32B: 32%), despite never seeing this task during RF-pretraining.

PDF Details

IJCAI Conference 2025 Conference Paper

DriftRemover: Hybrid Energy Optimizations for Anomaly Images Synthesis and Segmentation

Siyue Yao
Haotian Xu
Mingjie Sun
Siyue Yu
Jimin Xiao
Eng Gee Lim

This paper tackles the challenge of anomaly image synthesis and segmentation to generate various anomaly images and their segmentation labels to mitigate the issue of data scarcity. Existing approaches employ the precise mask to guide the generation, relying on additional mask generators, leading to increased computational costs and limited anomaly diversity. Although a few works use coarse masks as the guidance to expand diversity, they lack effective generation of labels for synthetic images, thereby reducing their practicality. Therefore, our proposed method simultaneously generates anomaly images and their corresponding masks by utilizing coarse masks and anomaly categories. The framework utilizes attention maps from synthesis process as mask labels and employs two optimization modules to tackle drift challenges, which are mismatches between synthetic results and real situations. Our evaluation demonstrates that our method improves pixel-level AP by 1. 3% and F1-MAX by 1. 8% in anomaly detection tasks on the MVTec dataset. Additionally, its successful application in practical scenarios highlights its effectiveness, improving IoU by 37. 2% and F-measure by 25. 1% with the Floor Dirt dataset. The code is available at https: //github. com/JJessicaYao/DriftRemover.

PDF Details DOI

IROS Conference 2025 Conference Paper

JPDS-NN: Reinforcement Learning-Based Dynamic Task Allocation for Agricultural Vehicle Routing Optimization

Yixuan Fan
Haotian Xu
Mengqiao Liu
Qing Zhuo
Tao Zhang

The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48. 4–65. 4%, lowers fuel consumption by 14. 0–17. 6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15–25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.

Details

IROS Conference 2025 Conference Paper

Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

Jun Zhu
Zihao Du
Haotian Xu
Fengbo Lan
Zilong Zheng
Bo Ma
Shengjie Wang
Tao Zhang

Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot’s pose. However, the robot’s orientation is crucial for successfully completing tasks because of how objects are arranged (e. g. , to open a refrigerator door). Humans intuitively navigate to objects with the right orientation using semantics and common sense. For instance, when opening a refrigerator, we naturally stand in front of it rather than to the side. Recent advances suggest that Vision-Language Models (VLMs) can provide robots with similar common sense. Therefore, we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches by precisely determining the optimal orientation relative to target objects, resulting in a 68. 8% reduction in Distance to Goal (DTG). Real-world video demonstrations can be found on the supplementary website 1.

Details

NeurIPS Conference 2025 Conference Paper

SilentStriker: Toward Stealthy Bit-Flip Attacks on Large Language Models

Haotian Xu
Qingsong Peng
Jie Shi
Huadi Zheng
Yu Li
Cheng Zhuo

The rapid adoption of large language models (LLMs) in critical domains has spurred extensive research into their security issues. While input manipulation attacks (e. g. , prompt injection) have been well-studied, Bit-Flip Attacks (BFAs)—which exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation—have received far less attention. Existing BFA methods suffer from key limitations: they fail to balance performance degradation and output naturalness, making them prone to discovery. In this paper, we introduce SilentStriker, the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness. Our core contribution lies in addressing the challenge of designing effective loss functions for LLMs with variable output length and the vast output space. Unlike prior approaches that rely on output perplexity for attack loss formulation, which in-evidently degrade the output naturalness, we reformulate the attack objective by leveraging key output tokens as targets for suppression, enabling effective joint optimization of attack effectiveness and stealthiness. Additionally, we employ an iterative, progressive search strategy to maximize attack efficacy. Experiments show that SilentStriker significantly outperforms existing baselines, achieving successful attacks without compromising the naturalness of generated text.

PDF Details

ICLR Conference 2025 Conference Paper

ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning

Zihan Ye
Shreyank N. Gowda
Shiming Chen 0002
Xiaowei Huang 0001
Haotian Xu
Fahad Shahbaz Khan
Yaochu Jin
Kaizhu Huang

Zero-shot Learning (ZSL) aims to enable classifiers to identify unseen classes. This is typically achieved by generating visual features for unseen classes based on learned visual-semantic correlations from seen classes. However, most current generative approaches heavily rely on having a sufficient number of samples from seen classes. Our study reveals that a scarcity of seen class samples results in a marked decrease in performance across many generative ZSL techniques. We argue, quantify, and empirically demonstrate that this decline is largely attributable to spurious visual-semantic correlations. To address this issue, we introduce ZeroDiff, an innovative generative framework for ZSL that incorporates diffusion mechanisms and contrastive representations to enhance visual-semantic correlations. ZeroDiff comprises three key components: (1) Diffusion augmentation, which naturally transforms limited data into an expanded set of noised data to mitigate generative model overfitting; (2) Supervised-contrastive (SC)-based representations that dynamically characterize each limited sample to support visual feature generation; and (3) Multiple feature discriminators employing a Wasserstein-distance-based mutual learning approach, evaluating generated features from various perspectives, including pre-defined semantics, SC-based representations, and the diffusion process. Extensive experiments on three popular ZSL benchmarks demonstrate that ZeroDiff not only achieves significant improvements over existing ZSL methods but also maintains robust performance even with scarce training data. Our codes are available at https://github.com/FouriYe/ZeroDiff_ICLR25.

Details

IJCAI Conference 2024 Conference Paper

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

Haotian Xu
Zhaorui Zhang
Sheng Di
Benben Liu
Khalid Ayed Alharthi
Jiannong Cao

Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Change point detection and inference in multivariate non-parametric models under mixing conditions

Carlos Misael Madrid Padilla
Haotian Xu
Daren Wang
OSCAR HERNAN MADRID PADILLA
Yi Yu

This paper addresses the problem of localizing and inferring multiple change points, in non-parametric multivariate time series settings. Specifically, we consider a multivariate time series with potentially short-range dependence, whose underlying distributions have Hölder smooth densities and can change over time in a piecewise-constant manner. The change points, which correspond to the times when the distribution changes, are unknown. We present the limiting distributions of the change point estimators under the scenarios where the minimal jump size vanishes or remains constant. Such results have not been revealed in the literature in non-parametric change point settings. As byproducts, we develop a sharp estimator that can accurately localize the change points in multivariate non-parametric time series, and a consistent block-type long-run variance estimator. Numerical studies are provided to complement our theoretical findings.

PDF Details

IROS Conference 2023 Conference Paper

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Haotian Xu
Shengjie Wang
Zhaolei Wang
Yunzhe Zhang
Qing Zhuo
Yang Gao 0029
Tao Zhang

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satis-faction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety bud-get) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.

Details