Arrow Research search

Author name cluster

Kejiang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction

  • Chao Wang
  • Zijin Yang
  • Yaofei Wang
  • Weiming Zhang
  • Kejiang Chen

The rapid advancement of image-generation technologies has made it possible for anyone to create photorealistic images using generative models, raising significant security concerns. To mitigate malicious use, tracing the origin of such images is essential. Reconstruction-based attribution methods offer a promising solution, but they often suffer from reduced accuracy and high computational costs when applied to state‑of‑the‑art (SOTA) models. To address these challenges, we propose AEDR (AutoEncoder Double-Reconstruction), a novel training‑free attribution method designed for generative models with continuous autoencoders. Unlike existing reconstruction‑based approaches that rely on the value of a single reconstruction loss, AEDR performs two consecutive reconstructions using the model’s autoencoder, and adopts the ratio of these two reconstruction losses as the attribution signal. This signal is further calibrated using the image homogeneity metric to improve accuracy, which inherently cancels out absolute biases caused by image complexity, with autoencoder‑based reconstruction ensuring superior computational efficiency. Experiments on eight top latent diffusion models show that AEDR achieves 25.5% higher attribution accuracy than existing reconstruction‑based methods, with requiring only 1% of the computational time.

ICLR Conference 2025 Conference Paper

A Closer Look at Machine Unlearning for Large Language Models

  • Xiaojian Yuan
  • Tianyu Pang
  • Chao Du
  • Kejiang Chen
  • Weiming Zhang 0001
  • Min Lin

Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. Due to the high cost of retraining from scratch, researchers attempt to employ machine unlearning to remove specific content from LLMs while preserving the overall performance. In this paper, we discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches. To address the issue of inadequate evaluation of model outputs after unlearning, we introduce three additional metrics to evaluate token diversity, sentence semantics, and factual correctness. We then categorize unlearning methods into untargeted and targeted, and discuss their issues respectively. Specifically, the behavior that untargeted unlearning attempts to approximate is unpredictable and may involve hallucinations, and existing regularization is insufficient for targeted unlearning. To alleviate these issues, we propose using the objective of maximizing entropy (ME) for untargeted unlearning and incorporate answer preservation (AP) loss as regularization for targeted unlearning. Experimental results across three scenarios, i.e., fictitious unlearning, continual unlearning, and real-world unlearning, demonstrate the effectiveness of our approaches. The code is available at https://github.com/sail-sg/closer-look-LLM-unlearning.

AAAI Conference 2025 Conference Paper

CoSDA: Enhancing the Robustness of Inversion-based Generative Image Watermarking Framework

  • Han Fang
  • Kejiang Chen
  • Zijin Yang
  • Bosen Cui
  • Weiming Zhang
  • Ee-Chien Chang

Generative image watermarking inserts secret watermarks into generated images and plays an important role in tracing the usages of generative models. For watermarking of diffusion models, inversion-based framework emerges as an effective approach. Such framework employs a robust mechanism to embed the watermark into the starting latent before ``forward sampling'', thereby generating images with the implicit watermark. During watermark detection, inversion techniques are employed to reverse the process and obtain the watermarked latent, followed by further extraction. The robustness of this technique hinges primarily on the embedding mechanism and inversion accuracy. Previous methods predominantly focused on enhancing the robustness of the embedding mechanism but overlooked the reduction of the inversion errors. However, our results show that inversion error will significantly affect the overall robustness. Therefore, in this paper, we delve into the inversion error aspect and propose CoSDA, a compensation sampling and drift alignment-based approach. The inversion error primarily accumulated during two stages: the internal error incurred by the algorithm, and the inevitable external noise. We observe that the main source of internal error comes from the mismatch in conditions (e.g. prompt, guidance scale) between forward and backward sampling processes. Therefore, we propose a compensation-based forward sampling, compensating for certain mismatch conditions and reducing the inversion error caused by the mismatch. Addressing external error caused by inevitable image distortions (e.g. JPEG compression), we introduce a drift-alignment approach, where a neural network is trained adversarially to restore the original watermarked latent from the distorted counterpart. Experimental results show that CoSDA effectively enhances watermark robustness while maintaining the visual quality of generated images.

ICML Conference 2025 Conference Paper

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

  • Wei Fan
  • Kejiang Chen
  • Chang Liu 0089
  • Weiming Zhang 0001
  • Nenghai Yu

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https: //de-antifake. github. io.

NeurIPS Conference 2025 Conference Paper

LD-RoViS: Training-free Robust Video Steganography for Deterministic Latent Diffusion Model

  • Xiangkun Wang
  • Kejiang Chen
  • Lincong Li
  • Weiming Zhang
  • Nenghai Yu

Existing video steganography methods primarily embed secret information by modifying video content in the spatial or compressed domains. However, such methods are prone to distortion drift and are easily detected by steganalysis. Generative steganography, which avoids direct modification of the cover data, offers a promising alternative. Despite recent advances, most generative steganography studies focus on images and are difficult to extend to videos because of compression-induced distortions and the unique architecture of video generation models. To address these challenges, we propose LD-RoViS, a training-free and robust video steganography framework for the deterministic latent diffusion model. By modulating implicit conditional parameters during the diffusion process, LD-RoViS constructs a dedicated steganographic channel. Additionally, we introduce a novel multi-mask mechanism to mitigate errors caused by video compression and post-processing. The experimental results demonstrate that LD-RoViS can embed approximately 12, 000 bits of data into a 5-second video with an extraction accuracy exceeding 99\%. Our implementation is available at https: //github. com/xiangkun1999/LD-RoViS.

IJCAI Conference 2025 Conference Paper

Multi-Label Text Classification with Label Attention Aware and Correlation Aware Contrastive Learning

  • Zhengzhong Zhu
  • Pei Zhou
  • Zeting Li
  • Kejiang Chen
  • Jiangping Zhu

Multi-label text classification (MLTC) is a challenging task where each document can be associated with multiple interdependent labels. This task is complicated by two key issues: the intricate correlations among labels and the partial overlap between labels and text relevance. Existing methods often fail to capture the semantic dependencies between labels or struggle to handle the ambiguities caused by partial overlaps, resulting in suboptimal representation learning. To address these challenges, we propose the Unified Contextual and Label-Aware Framework (UCLAF), which integrates a Label Attention Aware Network(LAN) and Correlation Aware Contrastive Learning (CACL) in a synergistic design. The Label Attention Aware Network explicitly models label dependencies by embedding labels and texts into a shared semantic space, aligning text representations with label semantics. Meanwhile, Correlation Aware Contrastive Learning refines these representations by dynamically modeling sample-level relationships, leveraging a contrastive loss function that accounts for the proportional overlap of labels between samples. This complementary approach enables UCLAF to jointly address complex label correlations and partial label overlaps. Extensive experiments on benchmark datasets demonstrate that UCLAF significantly outperforms state-of-the-art methods, showcasing its effectiveness in improving both representation learning and classification performance in MLTC tasks. We will release our code after the paper is accepted.

AAAI Conference 2025 Conference Paper

Provably Secure Image Robust Steganography via Cross-modal Error Correction

  • Yuang Qi
  • Kejiang Chen
  • Na Zhao
  • Zijin Yang
  • Weiming Zhang

The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.

AAAI Conference 2025 Conference Paper

RoPaSS: Robust Watermarking for Partial Screen-Shooting Scenarios

  • Zehua Ma
  • Han Fang
  • Xi Yang
  • Kejiang Chen
  • Weiming Zhang

Screen-shooting robust watermarking is an effective means of preventing screen content leakage from unauthorized camera shooting, as it can trace the leaked source through the watermark extraction thereby providing an effective deterrent. However, current screen-shooting resilient watermarking schemes rely on the image's contours to synchronize and then extract the watermark. While in practical applications, it's common for only a portion of the image to be captured, resulting in a limited performance of the previous watermarking schemes. To address this problem, we propose the RoPaSS: a robust watermarking scheme for partial screen-shooting scenarios, which effectively constructs symmetric characteristics on the embedding watermark to handle the sticky re-synchronization issue. Specifically, RoPaSS consists of a watermark encoder, a decoder, and three estimators, which are trained in two stages. In the first training stage, RoPaSS integrates the flipping operation into the watermark encoder and decoder training to increase the redundancy of watermark messages and artificially guide the generation of symmetric watermarks. In the second stage, estimators utilize the watermark symmetry as an additional reference to estimate the restoration parameters to resynchronize the partially captured watermarked image. Experiments have demonstrated the excellent performance of RoPaSS in partial screen-shooting traceability, with extraction accuracy of above 93% in frontal shooting and above 86% in 30° shooting even if only 50% of the image content is captured.

NeurIPS Conference 2025 Conference Paper

STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model

  • Yuang Qi
  • Na Zhao
  • Qiyi Yao
  • Benlong Wu
  • Weiming Zhang
  • Nenghai Yu
  • Kejiang Chen

Recent provably secure linguistic steganography (PSLS) methods rely on mainstream autoregressive language models (ARMs) to address historically challenging tasks, that is, to disguise covert communication as ``innocuous'' natural language communication. However, due to the characteristic of sequential generation of ARMs, the stegotext generated by ARM-based PSLS methods will produce serious error propagation once it changes, making existing methods unavailable under an active tampering attack. To address this, we propose a robust, provably secure linguistic steganography with diffusion language models (DLMs). Unlike ARMs, DLMs can generate text in a partially parallel manner, allowing us to find robust positions for steganographic embedding that can be combined with error-correcting codes. Furthermore, we introduce error correction strategies, including pseudo-random error correction and neighborhood search correction, during steganographic extraction. Theoretical proof and experimental results demonstrate that our method is secure and robust. It can resist token ambiguity in stegotext segmentation and, to some extent, withstand token-level attacks of insertion, deletion, and substitution.

NeurIPS Conference 2025 Conference Paper

StegoZip: Enhancing Linguistic Steganography Payload in Practice with Large Language Models

  • Jun Jiang
  • Zijin Yang
  • Weiming Zhang
  • Nenghai Yu
  • Kejiang Chen

Generative steganography has emerged as an active research area, yet its practical system is constrained by the inherent secret payload limitation caused by low entropy in generating stego texts. This payload limitation necessitates the use of lengthy stego texts or frequent transmissions, which increases the risk of suspicion by adversaries. Previous studies have mainly focused on payload enhancement through optimized entropy utilization while overlooking the crucial role of secret message processing. To address this gap, we propose StegoZip, a framework that leverages large language models to optimize secret message processing. StegoZip consists of two core components: semantic redundancy pruning and index-based compression coding. The former dynamically prunes the secret message to extract a low-semantic representation, whereas the latter further compresses it into compact binary codes. When integrated with state-of-the-art steganographic methods under lossless decoding, StegoZip achieves 2. 5$\times$ the payload of the baselines while maintaining comparable processing time in practice. This enhanced payload significantly improves covertness by mitigating the risks associated with frequent transmissions while maintaining provable content security.

NeurIPS Conference 2025 Conference Paper

T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models

  • Jindong Yang
  • Han Fang
  • Weiming Zhang
  • Nenghai Yu
  • Kejiang Chen

Diffusion models have advanced rapidly in recent years, producing high-fidelity images while raising concerns about intellectual property protection and the misuse of generative AI. Image watermarking for diffusion models, particularly Noise-as-Watermark (NaW) methods, encode watermark as specific standard Gaussian noise vector for image generation, embedding the infomation seamlessly while maintaining image quality. For detection, the generation process is inverted to recover the initial noise vector containing the watermark before extraction. However, existing NaW methods struggle to balance watermark robustness with generation diversity. Some methods achieve strong robustness by heavily constraining initial noise sampling, which degrades user experience, while others preserve diversity but prove too fragile for real-world deployment. To address this issue, we propose T2SMark, a two-stage watermarking scheme based on Tail-Truncated Sampling (TTS). Unlike prior methods that simply map bits to positive or negative values, TTS enhances robustness by embedding bits exclusively in the reliable tail regions while randomly sampling the central zone to preserve the latent distribution. Our two-stage framework then ensures sampling diversity by integrating a randomly generated session key into both encryption pipelines. We evaluate T2SMark on diffusion models with both U-Net and DiT backbones. Extensive experiments show that it achieves an optimal balance between robustness and diversity.

AAAI Conference 2024 Conference Paper

Data-Free Hard-Label Robustness Stealing Attack

  • Xiaojian Yuan
  • Kejiang Chen
  • Wen Huang
  • Jie Zhang
  • Weiming Zhang
  • Nenghai Yu

The popularity of Machine Learning as a Service (MLaaS) has led to increased concerns about Model Stealing Attacks (MSA), which aim to craft a clone model by querying MLaaS. Currently, most research on MSA assumes that MLaaS can provide soft labels and that the attacker has a proxy dataset with a similar distribution. However, this fails to encapsulate the more practical scenario where only hard labels are returned by MLaaS and the data distribution remains elusive. Furthermore, most existing work focuses solely on stealing the model accuracy, neglecting the model robustness, while robustness is essential in security-sensitive scenarios, e.g, face-scan payment. Notably, improving model robustness often necessitates the use of expensive techniques such as adversarial training, thereby further making stealing robustness a more lucrative prospect. In response to these identified gaps, we introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model without the help of any natural data. Comprehensive experiments demonstrate the effectiveness of our method. The clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack, which are only 4.71% and 8.40% lower than the target model on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is available at: https://github.com/LetheSec/DFHL-RS-Attack.

NeurIPS Conference 2024 Conference Paper

DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection

  • Xiao Yu
  • Yuang Qi
  • Kejiang Chen
  • Guoqiang Chen
  • Xi Yang
  • Pengyuan Zhu
  • Xiuwei Shang
  • Weiming Zhang

Large language models (LLMs) have the potential to generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Consequently, detecting whether a text is generated by LLMs has become increasingly important. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. However, since we do not have access to the interior of the black-box model, we must resort to surrogate models, which impacts detection quality. In order to achieve high-quality detection of black-box models, we would like to extract deep intrinsic characteristics of the black-box model generated texts. We view the generation process as a coupled process of prompt and intrinsic characteristics of the generative model. Based on this insight, we propose to decouple prompt and intrinsic characteristics (DPIC) for LLM-generated text detection method. Specifically, given a candidate text, DPIC employs an auxiliary LLM to reconstruct the prompt corresponding to the candidate text, then uses the prompt to regenerate text by the auxiliary LLM, which makes the candidate text and the regenerated text align with their prompts, respectively. Then, the similarity between the candidate text and the regenerated text is used as a detection feature, thus eliminating the prompt in the detection process, which allows the detector to focus on the intrinsic characteristics of the generative model. Compared to the baselines, DPIC has achieved an average improvement of 6. 76\% and 2. 91\% in detecting texts from different domains generated by GPT4 and Claude3, respectively.

AAAI Conference 2023 Conference Paper

Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions

  • Han Fang
  • Yupeng Qiu
  • Kejiang Chen
  • Jiyi Zhang
  • Weiming Zhang
  • Ee-Chien Chang

Deep learning-based digital watermarking frameworks have been widely studied recently. Most existing methods adopt an ``encoder-noise layer-decoder''-based architecture where the embedding and extraction processes are accomplished separately by the encoder and the decoder. However, one potential drawback of such a framework is that the encoder and the decoder may not be well coupled, resulting in the fact that the encoder may embed some redundant features into the host image thus influencing the invisibility and robustness of the whole algorithm. To address this limitation, this paper proposes a flow-based robust watermarking framework. The basic component of such framework is an invertible up-down-sampling neural block that can realize the embedding and extraction simultaneously. As a consequence, the encoded feature could keep high consistency with the feature that the decoder needed, which effectively avoids the embedding of redundant features. In addition, to ensure the robustness of black-box distortion, an invertible noise layer (INL) is designed to simulate the distortion and is served as a noise layer in the training stage. Benefiting from its reversibility, INL is also applied as a preprocessing before extraction to eliminate the distortion, which further improves the robustness of the algorithm. Extensive experiments demonstrate the superiority of the proposed framework in terms of visual quality and robustness. Compared with the state-of-the-art architecture, the visual quality (measured by PSNR) of the proposed framework improves by 2dB and the extraction accuracy after JPEG compression (QF=50) improves by more than 4%. Besides, the robustness against black-box distortions can be greatly achieved with more than 95% extraction accuracy.

AAAI Conference 2023 Conference Paper

Pseudo Label-Guided Model Inversion Attack via Conditional Generative Adversarial Network

  • Xiaojian Yuan
  • Kejiang Chen
  • Jie Zhang
  • Weiming Zhang
  • Nenghai Yu
  • Yang Zhang

Model inversion (MI) attacks have raised increasing concerns about privacy, which can reconstruct training data from public models. Indeed, MI attacks can be formalized as an optimization problem that seeks private data in a certain space. Recent MI attacks leverage a generative adversarial network (GAN) as an image prior to narrow the search space, and can successfully reconstruct even the high-dimensional data (e.g., face images). However, these generative MI attacks do not fully exploit the potential capabilities of the target model, still leading to a vague and coupled search space, i.e., different classes of images are coupled in the search space. Besides, the widely used cross-entropy loss in these attacks suffers from gradient vanishing. To address these problems, we propose Pseudo Label-Guided MI (PLG-MI) attack via conditional GAN (cGAN). At first, a top-n selection strategy is proposed to provide pseudo-labels for public data, and use pseudo-labels to guide the training of the cGAN. In this way, the search space is decoupled for different classes of images. Then a max-margin loss is introduced to improve the search process on the subspace of a target class. Extensive experiments demonstrate that our PLG-MI attack significantly improves the attack success rate and visual quality for various datasets and models, notably, 2 ∼ 3× better than state-of-the-art attacks under large distributional shifts. Our code is available at: https://github.com/LetheSec/PLG-MI-Attack.

AAAI Conference 2022 Conference Paper

Tracing Text Provenance via Context-Aware Lexical Substitution

  • Xi Yang
  • Jie Zhang
  • Kejiang Chen
  • Weiming Zhang
  • Zehua Ma
  • Feng Wang
  • Nenghai Yu

Text content created by humans or language models is often stolen or misused by adversaries. Tracing text provenance can help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional text watermarking methods embed watermarks by slightly altering text format like line spacing and font, which, however, are fragile to cross-media transmissions like OCR. Considering this, natural language watermarking methods represent watermarks by replacing words in original sentences with synonyms from handcrafted lexical resources (e. g. , WordNet), but they do not consider the substitution’s impact on the overall sentence’s meaning. Recently, a transformer-based network was proposed to embed watermarks by modifying the unobtrusive words (e. g. , function words), which also impair the sentence’s logical and semantic coherence. Besides, one well-trained network fails on other different types of text content. To address the limitations mentioned above, we propose a natural language watermarking scheme based on contextaware lexical substitution (LS). Specifically, we employ BERT to suggest LS candidates by inferring the semantic relatedness between the candidates and the original sentence. Based on this, a selection strategy in terms of synchronicity and substitutability is further designed to test whether a word is exactly suitable for carrying the watermark signal. Extensive experiments demonstrate that, under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences and has a better transferability than existing methods. Besides, the proposed LS approach outperforms the state-of-the-art approach on the Stanford Word Substitution Benchmark.