Author name cluster

Da Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

EAAI Journal 2025 Journal Article

High-resolution multi-view stereo with multi-scale feature fusion

Dapeng Chen
Qi Jia
Hao Wu
Da Yu
Nanxuan Huang
Jia Liu

Details DOI

NeurIPS Conference 2025 Conference Paper

Scaling Embedding Layers in Language Models

Da Yu
Edith Cohen
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Daogao Liu
Chiyuan Zhang

We propose SCONE (**S**calable, **C**ontextualized, **O**ffloaded, **N**-gram **E**mbedding), a new method for extending input embedding layers to enhance language model performance. To avoid increased decoding costs, SCONE retains the original vocabulary while introducing embeddings for a set of frequent $n$-grams. These embeddings provide contextualized representation for each input token and are learned with a separate model during training. After training, embeddings are precomputed and stored in off-accelerator memory; during inference, querying them has minimal impact on latency due to the low complexity of embedding lookups. SCONE enables two new scaling strategies: increasing the number of $n$-gram embeddings and scaling the model used to learn them, both while maintaining fixed accelerator usage during inference (in terms of FLOPS and memory). We show that scaling both aspects enables a model with 1B accelerator-resident parameters to outperform a 1. 9B-parameter baseline across diverse corpora, while using only about half the FLOPS and accelerator memory during inference.

PDF Details

ICML Conference 2025 Conference Paper

Scaling Laws for Differentially Private Language Models

Ryan McKenna
Yangsibo Huang
Amer Sinha
Borja Balle
Zachary Charles
Christopher A. Choquette-Choo
Badih Ghazi
Georgios Kaissis

Scaling laws have emerged as important components of large language model (LLM) training as they can predict performance gains through scale, and provide guidance on important hyper-parameter choices that would otherwise be expensive. LLMs also rely on large, high-quality training datasets, like those sourced from (sometimes sensitive) user data. Training models on this sensitive user data requires careful privacy protections like differential privacy (DP). However, the dynamics of DP training are significantly different, and consequently their scaling laws are not yet fully understood. In this work, we establish scaling laws that accurately model the intricacies of DP LLM training, providing a complete picture of the compute-privacy-utility and the optimal training configurations in many settings.

Details

NeurIPS Conference 2025 Conference Paper

Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries

Haoxiang Wang
Zinan Lin
Da Yu
Huishuai Zhang

Generating high-fidelity, differentially private (DP) synthetic images offers a promising route to share and analyze sensitive visual data without compromising individual privacy. However, existing DP image synthesis methods struggle to produce high-resolution outputs that faithfully capture the structure of the original data. In this paper, we introduce a novel method, referred to as Synthesis via Private Textual Intermediaries (SPTI), that can generate high-resolution DP images with easy adoptions. The key idea is to shift the challenge of DP image synthesis from the image domain to the text domain by leveraging state-of-the-art DP text generation methods. SPTI first summarizes each private image into a concise textual description using image-to-text models, then applies a modified Private Evolution algorithm to generate DP text, and finally reconstructs images using text-to-image models. Notably, SPTI requires no model training, only inferences with off-the-shelf models. Given a private dataset, SPTI produces synthetic images of substantially higher quality than prior DP approaches. On the LSUN Bedroom dataset, SPTI attains an FID $=$ 26. 71 under $\epsilon=1. 0$, improving over Private Evolution’s FID of 40. 36. Similarly, on MM-CelebA-HQ, SPTI achieves an FID $=$ 33. 27 at $\epsilon=1. 0$, compared to 57. 01 from DP fine-tuning baselines. Overall, our results demonstrate that Synthesis via Private Textual Intermediaries provides a resource-efficient and proprietary-model-compatible framework for generating high-resolution DP synthetic images, greatly expanding access to private visual datasets. Our code release: https: //github. com/MarkGodrick/SPTI

PDF Details

ICML Conference 2024 Conference Paper

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Chulin Xie
Zinan Lin 0001
Arturs Backurs
Sivakanth Gopi
Da Yu
Huseyin A. Inan
Harsha Nori
Haotian Jiang

Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i. e. , differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e. g. , GPT-3. 5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications.

Details

ICML Conference 2024 Conference Paper

Privacy-Preserving Instructions for Aligning Large Language Models

Da Yu
Peter Kairouz
Sewoong Oh
Zheng Xu 0002

Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users’ intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna.

Details

TMLR Journal 2024 Journal Article

Selective Pre-training for Private Fine-tuning

Da Yu
Sivakanth Gopi
Janardhan Kulkarni
Zinan Lin
Saurabh Naik
Tomasz Lukasz Religa
Jian Yin
Huishuai Zhang

Text prediction models, when used in applications like email clients or word processors, must protect user data privacy and adhere to model size constraints. These constraints are crucial to meet memory and inference time requirements, as well as to reduce inference costs. Building small, fast, and private domain-specific language models is a thriving area of research. In this work, we show that a careful pre-training on a subset of the public dataset that is guided by the private dataset is crucial to train small language models with differential privacy. On standard benchmarks, small models trained with our new framework achieve state-of-the-art performance. In addition to performance improvements, our results demonstrate that smaller models, through careful pre-training and private fine-tuning, can match the performance of much larger models that do not have access to private data. This underscores the potential of private learning for model compression and enhanced efficiency.

PDF Details

ICLR Conference 2023 Conference Paper

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

Jiyan He
Xuechen Li
Da Yu
Huishuai Zhang
Janardhan Kulkarni
Yin Tat Lee
Arturs Backurs
Nenghai Yu

Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.

Details

TMLR Journal 2023 Journal Article

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Da Yu
Gautam Kamath
Janardhan Kulkarni
Tie-Yan Liu
Jian Yin
Huishuai Zhang

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose \emph{output-specific} $(\varepsilon,\delta)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average $\varepsilon$ of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.

PDF Details

ICLR Conference 2022 Conference Paper

Differentially Private Fine-tuning of Language Models

Da Yu
Saurabh Naik
Arturs Backurs
Sivakanth Gopi
Huseyin A. Inan
Gautam Kamath 0001
Janardhan Kulkarni
Yin Tat Lee

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation when privately fine-tuning GPT-2. Our experiments also show that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

Details

ICLR Conference 2021 Conference Paper

Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

Da Yu
Huishuai Zhang
Wei Chen 0034
Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism. However, for meaningful privacy parameters, a differentially private model degrades the utility drastically when the model comprises a large number of trainable parameters. In this paper, we propose an algorithm \emph{Gradient Embedding Perturbation (GEP)} towards training differentially private deep models with decent accuracy. Specifically, in each gradient descent step, GEP first projects individual private gradient into a non-sensitive anchor subspace, producing a low-dimensional gradient embedding and a small-norm residual gradient. Then, GEP perturbs the low-dimensional embedding and the residual gradient separately according to the privacy budget. Such a decomposition permits a small perturbation variance, which greatly helps to break the dimensional barrier of private learning. With GEP, we achieve decent accuracy with low computational cost and modest privacy guarantee for deep models. Especially, with privacy bound $\epsilon=8$, we achieve $74.9\%$ test accuracy on CIFAR10 and $95.1\%$ test accuracy on SVHN, significantly improving over existing results.

Details

AAAI Conference 2021 Conference Paper

How Does Data Augmentation Affect Privacy in Machine Learning?

Da Yu
Huishuai Zhang
Wei Chen
Jian Yin
Tie-Yan Liu

It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI attack is widely used to measure the model’s information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i. e. , classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve 70. 1% MI attack success rate on CIFAR10 against a wide residual network while previous best approach only attains 61. 9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.

PDF Details

ICML Conference 2021 Conference Paper

Large Scale Private Learning via Low-rank Reparametrization

Da Yu
Huishuai Zhang
Wei Chen 0034
Jian Yin 0001
Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two \emph{gradient-carrier} matrices of small dimension and a \emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design \emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83. 9%$ on four downstream tasks with $\epsilon=8$, which is within $5%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

Details

IJCAI Conference 2020 Conference Paper

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

Da Yu
Huishuai Zhang
Wei Chen
Jian Yin
Tie-Yan Liu

Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in the non-private case. In contrast, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of differentially private (stochastic) gradient descent is determined by an expected curvature rather than the minimum curvature. The expected curvature, which represents the average curvature over the optimization path, is usually much larger than the minimum curvature. By using the expected curvature, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods. Finally, our extensive experiments suggest that gradient perturbation with the advanced composition method indeed outperforms other perturbation approaches by a large margin, matching our theoretical findings.

PDF Details DOI