Author name cluster

Xi Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

MacPrompt: Maraconic-Guided Jailbreak Against Text-to-Image Models

Xi Ye
Yiwen Liu
Lina Wang
Run Wang
Geying Yang
Yufei Hou
Jiayi Yu

Text-to-image (T2I) models have raised increasing safety concerns due to their capacity to generate NSFW and other banned objects. To mitigate these risks, safety filters and concept removal techniques have been introduced to block inappropriate prompts or erase sensitive concepts from the models. However, all the existing defense methods are not well prepared to handle diverse adversarial prompts. In this work, we introduce MacPrompt, a novel black-box and cross-lingual attack that reveals previously overlooked vulnerabilities in T2I safety mechanisms. Unlike existing attacks that rely on synonym substitution or prompt obfuscation, MacPrompt constructs macaronic adversarial prompts by performing cross-lingual character-level recombination of harmful terms, enabling fine-grained control over both semantics and appearance. By leveraging this design, MacPrompt crafts prompts with high semantic similarity to the original harmful inputs (up to 0.96) while bypassing major safety filters (up to 100%). More critically, it achieves attack success rates as high as 92% for sex-related content and 90\% for violence, effectively breaking even state-of-the-art concept removal defenses. These results underscore the pressing need to reassess the robustness of existing T2I safety mechanisms against linguistically diverse and fine-grained adversarial strategies. Warning: This paper includes sensitive examples (e.g., adult, violent, or illegal content). Unsafe images are masked but may still be disturbing.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

HIPP: Protecting Image Privacy via High-Quality Reversible Protected Version

Xi Ye
Lina Wang
Run Wang
Jiatong Liu
Geying Yang

With the rapid development of the internet, sharing photos through Social Network Platforms (SNPs) has become a new way for people to socialize, which poses serious threats to personal privacy. Recently, a thumbnail-preserving image privacy protection technique has emerged and garnered widespread attention. However, the existing schemes based on this technique often introduce noticeable noise into the protected image, resulting in poor visual quality. Motivated by the observation that a latent vector can be decoupled into the detail and contour components, in this paper, we propose HIPP, a thumbnail-preserving image privacy protection scheme that decouples the detail and contour information contained in the latent vector corresponding to the original image and reconstructs details by generation model. As a result, the generated protected image appears natural and has a thumbnail similar to the original one. Moreover, the protected images can be restored to versions that are indistinguishable from the original images. Experiments on CelebA, Helen, and LSUN datasets show that the SSIM between the restored and original images achieves 0. 9899. Furthermore, compared to the previous works, HIPP achieves the lowest runtime and file expansion rate, with values of 0. 07 seconds and 1. 1046, respectively.

PDF Details DOI

IROS Conference 2025 Conference Paper

RA-DP: Rapid Adaptive Diffusion Policy for Training-Free High-frequency Robotics Replanning

Xi Ye
Rui Heng Yang
Jun Jin
Yinchuan Li
Amir Rasouli

Diffusion models exhibit impressive scalability in robotic task learning, yet they struggle to adapt to novel, highly dynamic environments. This limitation primarily stems from their constrained replanning ability: they either operate at a low frequency due to a time-consuming iterative sampling process, or are unable to adapt to unforeseen feedback in case of rapid replanning. To address these challenges, we propose RA-DP, a novel diffusion policy framework with training-free high-frequency replanning ability that solves the above limitations by adapting to unforeseen dynamic environments. Specifically, our method integrates guidance signals, which are often easily obtained in the new environment during the diffusion sampling process, and utilizes a novel action queue mechanism to generate replanned actions at every denoising step without retraining, thus forming a complete training-free framework for robot motion adaptation in unseen environments. We conduct extensive evaluations in both common simulation benchmarks and real-world environments. Our results indicate that RA-DP outperforms the state-of-the-art diffusion-based methods in terms of replanning frequency and success rate. At the end, we show that our framework is theoretically compatible with any training-free guidance signal, hence increasing its applicability to a wide range of robotics tasks.

Details

EAAI Journal 2025 Journal Article

Topology-joint Curvilinear Segmentation Network using Confidence-based Bezier Topological Representation

Jianwei Li
Yuchun Huang
Xi Ye
He Yang

Segmenting continuous curve structures is crucial in medical lesion analysis, crack detection, and remote sensing image processing. Current semantic segmentation methods often neglect pixel-wise topological relationships, lack topological feature extraction, and lose fine curve details during upsampling, resulting in potential discontinuities. The discontinuity of the curve significantly reduces its analysis and application. Thus, this paper presents the Topology-joint Curvilinear Segmentation Network (TCSN), comprising Confidence-based Bezier Topological Representation (CBTR), Topology Enhanced Block (TEB), and Topology Supplement Block (TSB) to extract more continuous curve structure. CBTR establishes differentiable multiscale bezier representations, defining a topological loss function to guide network learning. TEB utilizes dilated convolutions to map features to bezier space, introducing topological encoding to enhance topology perception. TSB introduces continuous skeleton topological ground truth, using residual connections for topological repair and promoting segmentation of more continuous curves. Experiments on three datasets from medical analysis, crack detection, and remote sensing image processing domains yield topological segmentation accuracy(Quality) of 35. 21%, 23. 53%, and 59. 19%, respectively. These results surpass the recent state-of-the-art methods and demonstrate that TCSN possesses excellent topological preservation and cross-field application capabilities. To the best of our knowledge, this is the first study of introducing bezier into curve structure segmentation, demonstrating potential applications for other domains of curvilinear structures.

Details DOI

AAAI Conference 2025 Conference Paper

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

Xinlu Zhang
Zhiyu Zoey Chen
Xi Ye
Xianjun Yang
Lichang Chen
William Yang Wang
Linda Ruth Petzold

Instruction Fine-Tuning (IFT) significantly enhances the zero-shot capabilities of pretrained Large Language Models (LLMs). While coding data is known to boost LLM reasoning abilities during pretraining, its role in activating internal reasoning capacities during IFT remains understudied. This paper investigates a key question: How does coding data impact LLMs' reasoning capacities during IFT stage? To explore this, we thoroughly examine the impact of coding data across different coding data proportions, model families, sizes, and reasoning domains, from various perspectives. Specifically, we create three IFT datasets with increasing coding data proportions, fine-tune six LLM backbones across different families and scales on these datasets, evaluate the tuned models' performance across twelve tasks in three reasoning domains, and analyze the outcomes from three broad-to-granular perspectives: overall, domain-level, and task-specific. Our holistic analysis provides valuable insights into each perspective. First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales. Moreover, while the impact of coding data varies by domain, it shows consistent trends within each domain across different model families and scales. Additionally, coding data generally provides comparable task-specific benefits across model families, with optimal proportions in IFT datasets being task-dependent.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

LoFiT: Localized Fine-tuning on LLM Representations

Fangcong Yin
Xi Ye
Greg Durrett

Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. We introduce a framework called Localized Fine-Tuning on LLM Representations (LoFiT), which identifies a subset of attention heads that are most important for learning a specific task, then trains offset vectors to add to the model's hidden representations at those selected heads. LoFiT localizes to a sparse set of heads (3%-10%) and learns the offset vectors from limited training data, comparable to the settings used for representation intervention. For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention. We also find that the localization step is important: selecting a task-specific set of attention heads can lead to higher performance than intervening on heads selected for a different task. Finally, across 7 tasks we study, LoFiT achieves comparable performance to other parameter-efficient fine-tuning methods such as LoRA, despite modifying 20x-200x fewer parameters than these methods.

PDF Details DOI

AAAI Conference 2024 Conference Paper

STDiff: Spatio-Temporal Diffusion for Continuous Stochastic Video Prediction

Xi Ye
Guillaume-Alexandre Bilodeau

Predicting future frames of a video is challenging because it is difficult to learn the uncertainty of the underlying factors influencing their contents. In this paper, we propose a novel video prediction model, which has infinite-dimensional latent variables over the spatio-temporal domain. Specifically, we first decompose the video motion and content information, then take a neural stochastic differential equation to predict the temporal motion information, and finally, an image diffusion model autoregressively generates the video frame by conditioning on the predicted motion feature and the previous frame. The better expressiveness and stronger stochasticity learning capability of our model lead to state-of-the-art video prediction performances. As well, our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way the future video frames with an arbitrarily high frame rate. Our code is available at https://github.com/XiYe20/STDiffProject.

PDF Details DOI

YNICL Journal 2023 Journal Article

Aberrant single-subject morphological cerebellar connectome in chronic insomnia

Yuqin Ma
Shishun Fu
Xi Ye
Yuping Yang
Yi Yin
Guang Xu
Mengchen Liu
Guihua Jiang

BACKGROUND: To systematically investigate the topological organisation of morphological networks of the cerebellum using structural MRI and examine their clinical relevance in chronic insomnia (CI). METHODS: One hundred and one patients with CI and 102 healthy controls (HCs) were recruited in this study. Individual morphological networks of the cerebellum were constructed based on regional grey matter volume, and topologically characterised using weighted graph theory-based network approaches. Between-group comparisons were performed using permutation tests, and Spearman's correlation was used to examine the relationships between topological alterations and clinical variables. RESULTS: Compared with HCs, patients with CI exhibited a lower normalised clustering coefficient. Locally, CI patients exhibited lower nodal efficiency in the cerebellar lobule VIIb and vermis regions, but higher nodal efficiency in the right cerebellar lobule VIIIa regions. No correlations were observed between network alterations and clinical variables. CONCLUSIONS: Individual morphological network analysis provides a new strategy for investigating cerebellar morphometric changes in CI, and our findings may have important implications in establishing diagnostic and categorical biomarkers.

Details DOI

NeurIPS Conference 2023 Conference Paper

SatLM: Satisfiability-Aided Language Models Using Declarative Prompting

Xi Ye
Qiaochu Chen
Isil Dillig
Greg Durrett

Prior work has combined chain-of-thought prompting in large language models (LLMs) with programmatic representations to perform effective and transparent reasoning. While such an approach works well for tasks that only require forward reasoning (e. g. , straightforward arithmetic), it is less effective for constraint solving problems that require more sophisticated planning and search. In this paper, we propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of LLMs. We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. This approach has two key advantages. The declarative specification is closer to the problem description than the reasoning steps are, so the LLM can parse it out of the description more accurately. Furthermore, by offloading the actual reasoning task to an automated theorem prover, our approach can guarantee the correctness of the answer with respect to the parsed specification and avoid planning errors in the solving process. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm. In particular, SATLM outperforms program-aided LMs by 23% on a challenging subset of the GSM arithmetic reasoning dataset; SATLM also achieves a new SoTA on LSAT and BoardgameQA, surpassing previous models that are trained on the respective training sets.

PDF Details

NeurIPS Conference 2022 Conference Paper

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning

Xi Ye
Greg Durrett

Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning? We study this question on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We test the performance of four LLMs on three textual reasoning datasets using prompts that include explanations in multiple different styles. For these tasks, we find that including explanations in the prompts for OPT, GPT-3 (davinci), and InstructGPT (text-davinci-001) only yields small to moderate accuracy improvements over standard few-show learning. However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models’ predictions nor be factually grounded in the input, even on simple tasks with extractive explanations. However, these flawed explanations can still be useful as a way to verify LLMs’ predictions post-hoc. Through analysis in our three settings, we show that explanations judged by humans to be good—logically consistent with the input and the prediction—more likely cooccur with accurate predictions. Following these observations, we train calibrators using automatically extracted scores that assess the reliability of explanations, allowing us to improve performance post-hoc across all of our datasets.

PDF Details