Arrow Research search

Author name cluster

Subham S. Sahoo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

NeurIPS Conference 2024 Conference Paper

Diffusion Models With Learned Adaptive Noise

  • Subham S. Sahoo
  • Aaron Gokaslan
  • Chris De
  • Volodymyr Kuleshov

Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusionprocess can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MuLAN), a learned diffusion process that applies noise at different rates across an image. Our method consists of three components: a multivariate noise schedule, adaptive input-conditional diffusion, and auxiliary variables; these components ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MuLAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet while matching the performance of previous state-of-the-art models with 50% fewer steps. We provide the code, along with a blog post and video tutorial on the project page: https: //s-sahoo. com/MuLAN

NeurIPS Conference 2024 Conference Paper

Simple and Effective Masked Diffusion Language Models

  • Subham S. Sahoo
  • Marianne Arriola
  • Yair Schiff
  • Aaron Gokaslan
  • Edgar Marroquin
  • Justin T. Chiu
  • Alexander Rush
  • Volodymyr Kuleshov

While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form-it is a mixture of classical masked language modeling losses-and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We provide the code, along with a blog post and video tutorial on the project page: https: //s-sahoo. com/mdlm