Arrow Research search

Author name cluster

Haoran Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

AAAI Conference 2026 Conference Paper

MCPTox: A Benchmark for Tool Poisoning on Real-World MCP Servers

  • Zhiqiang Wang
  • Yichao Gao
  • Yanting Wang
  • Suyuan Liu
  • Haifeng Sun
  • Haoran Cheng
  • Guanquan Shi
  • Haohua Du

By providing a standardized interface for LLM agents to interact with external tools, the Model Context Protocol (MCP) is quickly becoming a cornerstone of the modern autonomous agent ecosystem. However, it creates novel attack surfaces due to untrusted external tools. While prior work has focused on attacks injected through external tool outputs, we investigate a more fundamental vulnerability: Tool Poisoning, where malicious instructions are embedded within a tool's metadata at the registration stage. To date, this threat has been primarily demonstrated through isolated cases, lacking a systematic, large-scale evaluation. We introduce MCPTox, the first benchmark to systematically evaluate agent robustness against Tool Poisoning in realistic MCP settings. MCPTox is constructed upon 45 live, real-world MCP servers and 353 authentic tools. To achieve this, we design three distinct attack templates to generate a comprehensive suite of 1348 malicious test cases by few-shot learning, covering 10 categories of potential risks. Our evaluation on 20 prominent LLM agents setting reveals a widespread vulnerability to Tool Poisoning, with GPT-o1-mini, achieving an attack success rate of 72.8%. We find that more capable models are often more susceptible, as the attack exploits their superior instruction-following abilities. Finally, the failure case analysis reveals that agents rarely refuse these attacks, with the highest refused rate (Claude-3.7-Sonnet) less than 3%, demonstrating that existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation. Our findings create a crucial empirical baseline for understanding and mitigating this widespread threat, and we release MCPTox for the development of verifiably safer AI agents.

NeurIPS Conference 2025 Conference Paper

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

  • Liang Peng
  • Boxi Wu
  • Haoran Cheng
  • Yibo Zhao
  • Xiaofei He

Direct preference optimization (DPO) is an effective method for aligning generative models with human preferences and has been successfully applied to fine‑tune text‑to‑image diffusion models. Its practical adoption, however, is hindered by a labor‑intensive pipeline that first produces a large set of candidate images and then requires humans to rank them pairwise. We address this bottleneck with self‑supervised direct preference optimization, a new paradigm that removes the need for any pre‑generated images or manual ranking. During training, we create preference pairs on the fly through self‑supervised image transformations, allowing the model to learn from fresh and diverse comparisons at every iteration. This online strategy eliminates costly data collection and annotation while remaining plug‑and‑play for any text‑to‑image diffusion method. Surprisingly, the on‑the‑fly pairs produced by the proposed method not only match but exceed the effectiveness of conventional DPO, which we attribute to the greater diversity of preferences sampled during training. Extensive experiments with Stable Diffusion 1. 5 and Stable Diffusion XL confirm that our method delivers substantial gains.

AAAI Conference 2024 Conference Paper

Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

  • Junkai Xu
  • Liang Peng
  • Haoran Cheng
  • Linxuan Xia
  • Qi Zhou
  • Dan Deng
  • Wei Qian
  • Wenxiao Wang

Multi-camera perception tasks have gained significant attention in the field of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of volume rendering. Specifically, we employ volume rendering to process the dense 3D features to obtain corresponding 2D features (e.g., depth maps, semantic maps), which are supervised by associated labels in the training. This manner regulates the generation of dense 3D features on the feature level, providing appropriate dense and unified features for multiple perception tasks. Therefore, our approach is termed Vampire, stands for ``Volume rendering As Multi-camera Perception Intermediate feature REgulator''. Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features, and is competitive with existing SOTA methods across diverse downstream perception tasks like 3D occupancy prediction, LiDAR segmentation and 3D objection detection, while utilizing moderate GPU resources. We provide a video demonstration in the supplementary materials and Codes are available at github.com/cskkxjk/Vampire.

IJCAI Conference 2023 Conference Paper

Coupled Point Process-based Sequence Modeling for Privacy-preserving Network Alignment

  • Dixin Luo
  • Haoran Cheng
  • Qingbin Li
  • Hongteng Xu

Network alignment aims at finding the correspondence of nodes across different networks, which is significant for many applications, e. g. , fraud detection and crime network tracing across platforms. In practice, however, accessing the topological information of different networks is often restricted and even forbidden, considering privacy and security issues. Instead, what we observed might be the event sequences of the networks' nodes in the continuous-time domain. In this study, we develop a coupled neural point process-based (CPP) sequence modeling strategy, which provides a solution to privacy-preserving network alignment based on the event sequences. Our CPP consists of a coupled node embedding layer and a neural point process module. The coupled node embedding layer embeds one network's nodes and explicitly models the alignment matrix between the two networks. Accordingly, it parameterizes the node embeddings of the other network by the push-forward operation. Given the node embeddings, the neural point process module jointly captures the dynamics of the two networks' event sequences. We learn the CPP model in a maximum likelihood estimation framework with an inverse optimal transport (IOT) regularizer. Experiments show that our CPP is compatible with various point process backbones and is robust to the model misspecification issue, which achieves encouraging performance on network alignment. The code is available at https: //github. com/Dixin-s-Lab/CNPP.