Author name cluster

Tom Sander

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

TMLR Journal 2025 Journal Article

Lognormal Mutations and their Use in Detecting Surreptitious Fake Images

Olivier Teytaud
Mariia Zameshina
Tom Sander
Pierre Fernandez
Furong Ye
Laurent Najman
Thomas Bäck
Ismail Labiad

In many cases, adversarial attacks against fake detectors employ algorithms specifically crafted for automatic image classifiers. These algorithms perform well, thanks to an excellent ad hoc distribution of initial attacks. However, these attacks are easily detected due to their specific initial distribution. Consequently, we explore alternative black-box attacks inspired by generic black-box optimization tools, particularly focusing on the \lognormal{} algorithm that we successfully extend to attack fake detectors. Moreover, we demonstrate that this attack evades detection by neural networks trained to flag classical adversarial examples. Therefore, we train more general models capable of identifying a broader spectrum of attacks, including classical black-box attacks designed for images, black-box attacks driven by classical optimization, and no-box attacks. By integrating these attack detection capabilities with fake detectors, we develop more robust and effective fake detection systems.

PDF Details

NeurIPS Conference 2025 Conference Paper

Rethinking the Role of Verbatim Memorization in LLM Privacy

Tom Sander
Bargav Jayaraman
Mark Ibrahim
Kamalika Chaudhuri
Chuan Guo

Conventional wisdom in machine learning privacy research states that memorization directly implies a loss of privacy. In contrast, a well-generalized model only remembers distributional patterns and preserves privacy of its training data. In this work, we show that this relationship is much more complex for LLMs trained for chat, and depends heavily on how knowledge is encoded and manipulated. To this end, we fine-tune language models on synthetically generated biographical information including PIIs, and try to extract them in different ways after instruction fine-tuning. We find counter to conventional wisdom that better verbatim memorization does not necessarily increase data leakage via chat. We also find that it is easier to extract information via chat from an LLM that is better able to manipulate and process knowledge even if it is smaller, and that not all attributes are equally extractable. This suggests that the relationship between privacy, memorization and language understanding of LLMs is very intricate, and that examining memorization in isolation can lead to misleading conclusions.

PDF Details

ICLR Conference 2025 Conference Paper

Watermark Anything With Localized Messages

Tom Sander
Pierre Fernandez
Alain Durmus
Teddy Furon
Matthijs Douze

Image watermarking methods are not tailored to handle small watermarked areas. This restricts applications in real-world scenarios where parts of the image may come from different sources or have been edited. We introduce a deep-learning model for localized image watermarking, dubbed the Watermark Anything Model (WAM). The WAM embedder imperceptibly modifies the input image, while the extractor segments the received image into watermarked and non-watermarked areas and recovers one or several hidden messages from the areas found to be watermarked. The models are jointly trained at low resolution and without perceptual constraints, then post-trained for imperceptibility and multiple watermarks. Experiments show that WAM is competitive with state-of-the art methods in terms of imperceptibility and robustness, especially against inpainting and splicing, even on high-resolution images. Moreover, it offers new capabilities: WAM can locate watermarked areas in spliced images and extract distinct 32-bit messages with less than 1 bit error from multiple small regions -- no larger than 10\% of the image surface -- even for small $256\times 256$ images. Training and inference code and model weights are available at https://github.com/facebookresearch/watermark-anything.

Details

ICML Conference 2024 Conference Paper

Differentially Private Representation Learning via Image Captioning

Tom Sander
Yaodong Yu
Maziar Sanjabi
Alain Durmus
Yi Ma 0001
Kamalika Chaudhuri
Chuan Guo 0001

Differentially private (DP) machine learning is considered the gold-standard solution for training a model from sensitive data while still preserving privacy. However, a major barrier to achieving this ideal is its sub-optimal privacy-accuracy trade-off, which is particularly visible in DP representation learning. Specifically, it has been shown that under modest privacy budgets, most models learn representations that are not significantly better than hand-crafted features. In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets. Through a series of engineering tricks, we successfully train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch using a reasonable amount of computation, and obtaining unprecedented high-quality image features that can be used in a variety of downstream vision and vision-language tasks. For example, under a privacy budget of $\varepsilon=8$ for the LAION dataset, a linear classifier trained on top of learned DP-Cap features attains $65. 8%$ accuracy on ImageNet-1K, considerably improving the previous SOTA of $56. 5%$. Our work challenges the prevailing sentiment that high-utility DP representation learning cannot be achieved by training from scratch.

Details

NeurIPS Conference 2024 Conference Paper

Watermarking Makes Language Models Radioactive

Tom Sander
Pierre Fernandez
Alain Durmus
Matthijs Douze
Teddy Furon

We investigate the radioactivity of text generated by large language models (LLM), \ie whether it is possible to detect that such synthetic input was used to train a subsequent LLM. Current methods like membership inference or active IP protection either work only in settings where the suspected text is known or do not provide reliable statistical guarantees. We discover that, on the contrary, it is possible to reliably determine if a language model was trained on synthetic data if that data is output by a watermarked LLM. Our new methods, specialized for radioactivity, detects with a provable confidence weak residuals of the watermark signal in the fine-tuned LLM. We link the radioactivity contamination level to the following properties: the watermark robustness, its proportion in the training set, and the fine-tuning process. For instance, if the suspect model is open-weight, we demonstrate that training on watermarked instructions can be detected with high confidence ($p$-value $< 10^{-5}$) even when as little as $5\%$ of training text is watermarked.

PDF Details DOI

ICML Conference 2023 Conference Paper

TAN Without a Burn: Scaling Laws of DP-SGD

Tom Sander
Pierre Stock
Alexandre Sablayrolles

Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more computing resources than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of Renyi Differential Privacy (RDP) to highlight that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a $100\times$ reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a $+9$ points gain in top-1 accuracy for a privacy budget $\varepsilon=8$.

Details