Author name cluster

Jiale Cai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

Graph Domain Adaptation via Homophily-Agnostic Reconstructing Structure

Ruiyi Fang
Shuo Wang
Ruizhi Pu
Qiuhao Zeng
Hao Zheng
Ziyan Wang
Jiale Cai
Zhimin Mei

Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. However, existing GDA methods typically assume that both source and target graphs exhibit homophily, leading existing methods to perform poorly when heterophily is present. Furthermore, the lack of labels in the target graph makes it impossible to assess its homophily level beforehand. To address this challenge, we propose a novel homophily-agnostic approach that effectively transfers knowledge between graphs with varying degrees of homophily. Specifically, we adopt a divide-and-conquer strategy that first separately reconstructs highly homophilic and heterophilic variants of both the source and target graphs, and then performs knowledge alignment separately between corresponding graph variants. Extensive experiments conducted on five benchmark datasets demonstrate the superior performance of our approach, particularly highlighting its substantial advantages on heterophilic graphs.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Versatile Transferable Unlearnable Example Generator

Zhihao Li
Jiale Cai
Gezheng Xu
Hao Zheng
Qiuyue Li
Fan Zhou
Shichun Yang
Charles Ling

The rapid growth of publicly available data has fueled deep learning advancements but also raises concerns about unauthorized data usage. Unlearnable Examples (UEs) have emerged as a data protection strategy that introduces imperceptible perturbations to prevent unauthorized learning. However, most existing UE methods produce perturbations strongly tied to specific training sets, leading to a significant drop in unlearnability when applied to unseen data or tasks. In this paper, we argue that for broad applicability, UEs should maintain their effectiveness across diverse application scenarios. To this end, we conduct the first comprehensive study on the transferability of UEs across diverse and practical yet demanding settings. Specifically, we identify key scenarios that pose significant challenges for existing UE methods, including varying styles, out-of-distribution classes, resolutions, and architectures. Moreover, we propose $\textbf{Versatile Transferable Generator}$ (VTG), a transferable generator designed to safeguard data across various conditions. Specifically, VTG integrates Adversarial Domain Augmentation (ADA) into the generator’s training process to synthesize out-of-distribution samples, thereby improving its generalizability to unseen scenarios. Furthermore, we propose a Perturbation-Label Coupling (PLC) mechanism that leverages contrastive learning to directly align perturbations with class labels. This approach reduces the generator’s reliance on data semantics, allowing VTG to produce unlearnable perturbations in a distribution-agnostic manner. Extensive experiments demonstrate the effectiveness and broad applicability of our approach. Code is available at https: //github. com/zhli-cs/VTG.

PDF Details

AAAI Conference 2025 Conference Paper

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Hang Zhou
Jiale Cai
Yuteng Ye
Yonghui Feng
Chenxing Gao
Junqing Yu
Zikai Song
Wei Yang

A recent endeavor in one class of video anomaly detection is to leverage diffusion models and posit the task as a generation problem, where the diffusion model is trained to recover normal patterns exclusively, thus reporting abnormal patterns as outliers. Yet, existing attempts neglect the various formations of anomaly and predict normal samples at the feature level regardless that abnormal objects in surveillance videos are often relatively small. To address this, a novel patch-based diffusion model is proposed, specifically engineered to capture fine-grained local information. We further observe that anomalies in videos manifest themselves as deviations in both appearance and motion. Therefore, we argue that a comprehensive solution must consider both of these aspects simultaneously to achieve accurate frame prediction. To address this, we introduce innovative motion and appearance conditions that are seamlessly integrated into our patch diffusion model. These conditions are designed to guide the model in generating coherent and contextually appropriate predictions for both semantic content and motion relations. Experimental results on four challenging video anomaly detection datasets empirically substantiate the efficacy of our proposed approach, demonstrating that it consistently outperforms most existing methods in detecting abnormal behaviors.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Attacking Transformers with Feature Diversity Adversarial Perturbation

Chenxing Gao
Hang Zhou
Junqing Yu
Yuteng Ye
Jiale Cai
Junle Wang
Wei Yang

Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturbations, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on labels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black-box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Dynamic Feature Pruning and Consolidation for Occluded Person Re-identification

Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu

Occluded person re-identification (ReID) is a challenging problem due to contamination from occluders. Existing approaches address the issue with prior knowledge cues, such as human body key points and semantic segmentations, which easily fail in the presence of heavy occlusion and other humans as occluders. In this paper, we propose a feature pruning and consolidation (FPC) framework to circumvent explicit human structure parsing. The framework mainly consists of a sparse encoder, a multi-view feature mathcing module, and a feature consolidation decoder. Specifically, the sparse encoder drops less important image tokens, mostly related to background noise and occluders, solely based on correlation within the class token attention. Subsequently, the matching stage relies on the preserved tokens produced by the sparse encoder to identify k-nearest neighbors in the gallery by measuring the image and patch-level combined similarity. Finally, we use the feature consolidation module to compensate pruned features using identified neighbors for recovering essential information while disregarding disturbance from noise and occlusion. Experimental results demonstrate the effectiveness of our proposed framework on occluded, partial, and holistic Re-ID datasets. In particular, our method outperforms state-of-the-art results by at least 8.6% mAP and 6.0% Rank-1 accuracy on the challenging Occluded-Duke dataset.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Progressive Text-to-Image Diffusion with Soft Latent Direction

Yuteng Ye
Jiale Cai
Hang Zhou
Guanwen Li
Youjia Zhang
Zikai Song
Chenxing Gao
Junqing Yu

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct semantic operations—namely insertion, editing, and erasing—we formulate the Stimulus, Response, and Fusion (SRF) framework. Within this framework, latent regions are gently stimulated in alignment with each operation, followed by the fusion of the responsive latent components to achieve cohesive entity manipulation. Our proposed framework yields notable advancements in object synthesis, particularly when confronted with intricate and lengthy textual inputs. Consequently, it establishes a new benchmark for text-to-image generation tasks, further elevating the field's performance standards.

PDF Details DOI