SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Yunhao Chen; Shujie Wang; Difan Zou; Xingjun Ma

doi:10.1609/aaai.v40i1.36972

Back to AAAI

AAAI 2026

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Conference Paper AAAI Technical Track on Application Domains I Artificial Intelligence

PDF Details DOI

Abstract

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior research finds conditional DPMs highly susceptible to data extraction attacks using explicit prompts, unconditional models are often assumed to be safe. We challenge this view by introducing Surrogate condItional Data Extraction (SIDE), a general framework that constructs data-driven surrogate conditions to enable targeted extraction from any DPM. Through extensive experiments on CIFAR-10, CelebA, ImageNet, and LAION-5B, we show that SIDE can successfully extract training data from so-called safe unconditional models, outperforming baseline attacks even on conditional models. Complementing these findings, we present a unified theoretical framework based on informative labels, demonstrating that all forms of conditioning, explicit or surrogate, amplify memorization. Our work redefines the threat landscape for DPMs, establishing precise conditioning as a fundamental vulnerability and setting a new, stronger benchmark for model privacy evaluation.

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Abstract

Authors

Keywords

Context