Can We Infer Confidential Properties of Training Data from LLMs?

Pengrun Huang; Chhavi Yadav; Kamalika Chaudhuri; Ruihan Wu

Back to NeurIPS

NeurIPS 2025

Can We Infer Confidential Properties of Training Data from LLMs?

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties — such as patient demographics or disease prevalence—that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e. g. , image classification models) and generative models (e. g. , GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.

Can We Infer Confidential Properties of Training Data from LLMs?

Abstract

Authors

Keywords

Context