Author name cluster

Xingjun Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers

2 author rows

AAAI Conference 2026 Conference Paper

Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models

Xueqi Ma
Xingjun Ma
Sarah Monazam Erfani
Danilo Mandic
James Bailey

Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications—especially high-stake settings like fraud detection and medical diagnosis—demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: Can OOD detection be extended to OOD classification without true label information? To answer this question, we introduce a Coarse-to-Fine open-set Classification (CFC) method that leverages large language models (LLMs) for text-attributed graphs. CFC consists of three key components: (1) A coarse classifier that utilizes LLM prompts for OOD detection and outlier label generation; (2) A GNN-based fine classifier trained with OOD samples from (1) for enhanced OOD detection and ID classification; and (3) Refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods relying on synthetic or auxiliary OOD samples, CFC employs semantic OOD data-instances that are genuinely out-of-distribution based on their inherent meaning, thus improving interpretability and practical utility. CFC enhances OOD detection by 10% compared to state-of-the-art approaches on text-attributed graphs and in the text domain, while achieving up to 70% accuracy in OOD classification on graph datasets.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Yunhao Chen
Shujie Wang
Difan Zou
Xingjun Ma

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior research finds conditional DPMs highly susceptible to data extraction attacks using explicit prompts, unconditional models are often assumed to be safe. We challenge this view by introducing Surrogate condItional Data Extraction (SIDE), a general framework that constructs data-driven surrogate conditions to enable targeted extraction from any DPM. Through extensive experiments on CIFAR-10, CelebA, ImageNet, and LAION-5B, we show that SIDE can successfully extract training data from so-called safe unconditional models, outperforming baseline attacks even on conditional models. Complementing these findings, we present a unified theoretical framework based on informative labels, demonstrating that all forms of conditioning, explicit or surrogate, amplify memorization. Our work redefines the threat landscape for DPMs, establishing precise conditioning as a fundamental vulnerability and setting a new, stronger benchmark for model privacy evaluation.

PDF Details DOI

AAAI Conference 2025 Conference Paper

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

Teng Li
Xingjun Ma
Yu-Gang Jiang

Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications. While there have been notable advancements in untargeted transferable attacks, targeted transferable attacks remain a significant challenge. In this work, we focus on generative approaches for targeted transferable attacks. Current generative attacks focus on reducing overfitting to surrogate models and the source data domain, but they often overlook the importance of enhancing transferability through additional semantics. To address this issue, we introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability. Specifically, we propose a Semantic Injection Module (SIM) that utilizes the semantics contained in an additional guiding image to improve transferability. The guiding image provides a simple yet effective method to incorporate target semantics from the target class to create targeted and highly transferable attacks. Additionally, we propose new loss formulations that can integrate the semantic injection module more effectively for both targeted and untargeted attacks. We conduct comprehensive experiments under both targeted and untargeted attack settings to demonstrate the efficacy of our proposed approach.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

Yige Li
Hanxun Huang
Yunhan Zhao
Xingjun Ma
Jun Sun

Generative large language models (LLMs) have achieved state-of-the-art results on a wide range of tasks, yet they remain susceptible to backdoor attacks: carefully crafted triggers in the input can manipulate the model to produce adversary-specified outputs. While prior research has predominantly focused on backdoor risks in vision and classification settings, the vulnerability of LLMs in open-ended text generation remains underexplored. To fill this gap, we introduce \textit{BackdoorLLM}\footnote{Our BackdoorLLM benchmark was awarded First Prize in the \href{https: //www. mlsafety. org/safebench/winners}{SafetyBench competition} organized by the \href{https: //safe. ai/}{Center for AI Safety}. }, the first comprehensive benchmark for systematically evaluating backdoor threats in text-generation LLMs. BackdoorLLM provides: (i) a unified repository of benchmarks with a standardized training and evaluation pipeline; (ii) a diverse suite of attack modalities, including data poisoning, weight poisoning, hidden-state manipulation, and chain-of-thought hijacking; (iii) over 200 experiments spanning 8 distinct attack strategies, 7 real-world scenarios, and 6 model architectures; (iv) key insights into the factors that govern backdoor effectiveness and failure modes in LLMs; and (v) a defense toolkit encompassing 7 representative mitigation techniques. Our code and datasets are available at \url{https: //github. com/bboylyg/BackdoorLLM}. We will continuously incorporate emerging attack and defense methodologies to support the research in advancing the safety and reliability of LLMs.

PDF Details

ICLR Conference 2025 Conference Paper

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang 0001

In this paper, we focus on black-box defense for VLMs against jailbreak attacks. Existing black-box defense methods are either unimodal or bimodal. Unimodal methods enhance either the vision or language module of the VLM, while bimodal methods robustify the model through text-image representation realignment. However, these methods suffer from two limitations: 1) they fail to fully exploit the cross-modal information, or 2) they degrade the model performance on benign inputs. To address these limitations, we propose a novel blue-team method BlueSuffix that defends target VLMs against jailbreak attacks without compromising its performance under black-box setting. BlueSuffix includes three key components: 1) a visual purifier against jailbreak images, 2) a textual purifier against jailbreak texts, and 3) a blue-team suffix generator using reinforcement fine-tuning for enhancing cross-modal robustness. We empirically show on four VLMs (LLaVA, MiniGPT-4, InstructionBLIP, and Gemini) and four safety benchmarks (Harmful Instruction, AdvBench, MM-SafetyBench, and RedTeam-2K) that BlueSuffix outperforms the baseline defenses by a significant margin. Our BlueSuffix opens up a promising direction for defending VLMs against jailbreak attacks. Code is available at https://github.com/Vinsonzyh/BlueSuffix.