RUAGO: Effective and Practical Retain-Free Unlearning via Adversarial Attack and OOD Generator

Sangyong Lee; Sangjun Chung; Simon Woo

Back to NeurIPS

NeurIPS 2025

RUAGO: Effective and Practical Retain-Free Unlearning via Adversarial Attack and OOD Generator

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

With increasing regulations on private data usage in AI systems, machine unlearning has emerged as a critical solution for selectively removing sensitive information from trained models while preserving their overall utility. While many existing unlearning methods rely on the retain data to mitigate the performance decline caused by forgetting, such data may not always be available ( retain-free ) in real-world scenarios. To address this challenge posed by retain-free unlearning, we introduce RUAGO, utilizing adversarial soft labels to mitigate over-unlearning and a generative model pretrained on out-of-distribution (OOD) data to effectively distill the original model’s knowledge. We introduce a progressive sampling strategy to incrementally increase synthetic data complexity, coupled with an inversion-based alignment step that ensures the synthetic data closely matches the original training distribution. Our extensive experiments on multiple benchmark datasets and architectures demonstrate that our approach consistently outperforms existing retain-free methods and achieves comparable or superior performance relative to retain-based approaches, demonstrating its effectiveness and practicality in real-world, data-constrained environments.

RUAGO: Effective and Practical Retain-Free Unlearning via Adversarial Attack and OOD Generator

Abstract

Authors

Keywords

Context