Radioactive data: tracing through training

Alexandre Sablayrolles; Matthijs Douze; Cordelia Schmid; Hervé Jégou

Back to ICML

ICML 2020

Radioactive data: tracing through training

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning

Details

Abstract

Data tracing determines whether particular data samples have been used to train a model. We propose a new technique, radioactive data, that makes imperceptible changes to these samples such that any model trained on them will bear an identifiable mark. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Experiments on large-scale benchmarks (Imagenet), with standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we detect radioactive data with high confidence (p<0. 0001) when only 1% of the data used to train a model is radioactive. Our radioactive mark is resilient to strong data augmentations and variations of the model architecture. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 49579753448716856