Author name cluster

Constantinos Daskalakis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

76 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Ambient Diffusion Omni: Training Good Models with Bad Data

Giannis Daras
Adrian Rodriguez-Munoz
Adam Klivans
Antonio Torralba
Constantinos Daskalakis

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from arbitrarily images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We use our framework to achieve state-of-the-art ImageNet FID and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

PDF Details

NeurIPS Conference 2025 Conference Paper

Ambient Proteins - Training Diffusion Models on Noisy Structures

Giannis Daras
Jeffrey Ouyang-Zhang
Krithika Ravishankar
Constantinos Daskalakis
Adam Klivans
Daniel Diaz

We present Ambient Protein Diffusion, a framework for training protein diffusion models that generates structures with unprecedented diversity and quality. State-of-the-art generative models are trained on computationally derived structures from AlphaFold2 (AF), as experimentally determined structures are relatively scarce. The resulting models are therefore limited by the quality of synthetic datasets. Since the accuracy of AF predictions degrades with increasing protein length and complexity, de novo generation of long, complex proteins remains challenging. Ambient Protein Diffusion overcomes this problem by treating low-confidence AF structures as corrupted data. Rather than simply filtering out low-quality AF structures, our method adjusts the diffusion objective for each structure based on its corruption level, allowing the model to learn from both high and low quality structures. Empirically, ambient protein diffusion yields major improvements: on proteins with 700 residues, diversity increases from 45% to 85% from the previous state-of-the-art, and designability improves from 70% to 88%.

PDF Details

STOC Conference 2025 Conference Paper

Breaking the T^(2/3) Barrier for Sequential Calibration

Yuval Dagan
Constantinos Daskalakis
Maxwell Fishelson
Noah Golowich
Robert Kleinberg
Princewill Okoroafor