Author name cluster

Jonathan Ho

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

ICLR Conference 2023 Conference Paper

Discrete Predictor-Corrector Diffusion Models for Image Synthesis

José Lezama
Tim Salimans
Lu Jiang 0004
Huiwen Chang
Jonathan Ho
Irfan Essa

We introduce Discrete Predictor-Corrector diffusion models (DPC), extending predictor-corrector samplers in Gaussian diffusion models to the discrete case. Predictor-corrector samplers are a class of samplers for diffusion models, which improve on ancestral samplers by correcting the sampling distribution of intermediate diffusion states using MCMC methods. In DPC, the Langevin corrector, which does not have a direct counterpart in discrete space, is replaced with a discrete MCMC transition defined by a learned corrector kernel. The corrector kernel is trained to make the correction steps achieve asymptotic convergence, in distribution, to the correct marginal of the intermediate diffusion states. Equipped with DPC, we revisit recent transformer-based non-autoregressive generative models through the lens of discrete diffusion, and find that DPC can alleviate the compounding decoding error due to the parallel sampling of visual tokens. Our experiments show that DPC improves upon existing discrete latent space models for class-conditional image generation on ImageNet, and outperforms continuous diffusion models and GANs, according to standard metrics and user preference studies.

Details

ICLR Conference 2023 Conference Paper

Novel View Synthesis with Diffusion Models

Daniel Watson
William Chan
Ricardo Martin-Brualla
Jonathan Ho
Andrea Tagliasacchi
Mohammad Norouzi 0002

We present 3DiM (pronounced "three-dim"), a diffusion model for 3D novel view synthesis from as few as a single image. The core of 3DiM is an image-to-image diffusion model -- 3DiM takes a single reference view and their poses as inputs, and generates a novel view via diffusion. 3DiM can then generate a full 3D consistent scene following our novel stochastic conditioning sampler: the output frames of the scene are generated autoregressively, and during the reverse diffusion process of each individual frame, we select a random conditioning frame from the set of previous frames at each denoising step. We demonstrate that stochastic conditioning yields much more 3D consistent results compared to the naive sampling process which only conditions on a single previous frame. We compare 3DiMs to prior work on the SRN ShapeNet dataset, demonstrating that 3DiM's generated videos from a single view achieve much higher fidelity while being approximately 3D consistent. We also introduce a new evaluation methodology, 3D consistency scoring, to measure the 3D consistency of a generated object by training a neural field on the model's output views. 3DiMs are geometry free, do not rely on hyper-networks or test-time optimization for novel view synthesis, and allow a single model to easily scale to a large number of scenes.

Details

JMLR Journal 2022 Journal Article

Cascaded Diffusion Models for High Fidelity Image Generation

Jonathan Ho
Chitwan Saharia
William Chan
David J. Fleet
Mohammad Norouzi
Tim Salimans

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, and classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256, outperforming VQ-VAE-2. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

PDF Details

ICLR Conference 2022 Conference Paper

Learning Fast Samplers for Diffusion Models by Differentiating Through Sample Quality

Daniel Watson
William Chan
Jonathan Ho
Mohammad Norouzi 0002

Diffusion models have emerged as an expressive family of generative models rivaling GANs in sample quality and autoregressive models in likelihood scores. Standard diffusion models typically require hundreds of forward passes through the model to generate a single high-fidelity sample. We introduce Differentiable Diffusion Sampler Search (DDSS): a method that optimizes fast samplers for any pre-trained diffusion model by differentiating through sample quality scores. We also present Generalized Gaussian Diffusion Models (GGDM), a family of flexible non-Markovian samplers for diffusion models. We show that optimizing the degrees of freedom of GGDM samplers by maximizing sample quality scores via gradient descent leads to improved sample quality. Our optimization procedure backpropagates through the sampling process using the reparametrization trick and gradient rematerialization. DDSS achieves strong results on unconditional image generation across various datasets (e.g., FID scores on LSUN church 128x128 of 11.6 with only 10 inference steps, and 4.82 with 20 steps, compared to 51.1 and 14.9 with strongest DDPM/DDIM baselines). Our method is compatible with any pre-trained diffusion model without fine-tuning or re-training required.

Details

NeurIPS Conference 2022 Conference Paper

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
Emily L. Denton
Kamyar Ghasemipour
Raphael Gontijo Lopes

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e. g. , T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7. 27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

PDF Details

ICLR Conference 2022 Conference Paper

Progressive Distillation for Fast Sampling of Diffusion Models

Tim Salimans
Jonathan Ho

Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps, compared to models in the literature. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with (near) state-of-the-art samplers taking 1024 or 8192 steps, and are able to distill down to models taking as little as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.

Details

NeurIPS Conference 2022 Conference Paper

Video Diffusion Models

Jonathan Ho
Tim Salimans
Alexey Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on established benchmarks for video prediction and unconditional video generation. Supplementary material is available at https: //video-diffusion. github. io/.

PDF Details

NeurIPS Conference 2021 Conference Paper

Structured Denoising Diffusion Models in Discrete State-Spaces

Jacob Austin
Daniel D. Johnson
Jonathan Ho
Daniel Tarlow
Rianne van den Berg

Denoising diffusion probabilistic models (DDPMs) [Ho et al. 2021] have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. [2021], by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.

PDF Details

IROS Conference 2021 Conference Paper

Understanding and Segmenting Human Demonstrations into Reusable Compliant Primitives

Elena Galbally Herrero
Jonathan Ho
Oussama Khatib

Hard coded robotic manipulation skills work well in known, predictable and repeatable situations. Human environments, however, are better described as dynamic, chaotic, uncertain or unstructured. Therefore, plans relying on preprogrammed trajectories are bound to fail in these settings. In order to increase robustness to uncertainty and avoid coding new skills from scratch, we can make flexible plans that execute existing autonomous primitives based on the sensed state of the environment. A key challenge of this approach is finding the sequence of primitives required to perform the desired task. This work uses a variation of a Hidden Markov Model (HMM) with an augmented particle filter to find the primitive sequence using only a reduced number of human demonstrations. The algorithm was tested on 40 demonstrations of two different manipulation tasks involving six primitives. It was seeded with a single manually labelled demonstration of each task and was able to automatically label the other 38 demonstration sequences with an average success of 81. 5%. The results show improved convergence and a 9% increase in accuracy over other versions of the algorithm.

Details

NeurIPS Conference 2021 Conference Paper

Variational Diffusion Models

Diederik Kingma
Tim Salimans
Ben Poole
Jonathan Ho

Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to use the model as part of a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum.

PDF Details

NeurIPS Conference 2020 Conference Paper

Denoising Diffusion Probabilistic Models

Jonathan Ho
Ajay Jain
Pieter Abbeel

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9. 46 and a state-of-the-art FID score of 3. 17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.

PDF Details

ICML Conference 2019 Conference Paper

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

Friso H. Kingma
Pieter Abbeel
Jonathan Ho

The bits-back argument suggests that latent variable models can be turned into lossless compression schemes. Translating the bits-back argument into efficient and practical lossless compression schemes for general latent variable models, however, is still an open problem. Bits-Back with Asymmetric Numeral Systems (BB-ANS), recently proposed by Townsend et al, . 2019, makes bits-back coding practically feasible for latent variable models with one latent layer, but it is inefficient for hierarchical latent variable models. In this paper we propose Bit-Swap, a new compression scheme that generalizes BB-ANS and achieves strictly better compression rates for hierarchical latent variable models with Markov chain structure. Through experiments we verify that Bit-Swap results in lossless compression rates that are empirically superior to existing techniques.

Details

NeurIPS Conference 2019 Conference Paper

Compression with Flows via Local Bits-Back Coding

Jonathan Ho
Evan Lohn
Pieter Abbeel

Likelihood-based generative models are the backbones of lossless compression due to the guaranteed existence of codes with lengths close to negative log likelihood. However, there is no guaranteed existence of computationally efficient codes that achieve these lengths, and coding algorithms must be hand-tailored to specific types of generative models to ensure computational efficiency. Such coding algorithms are known for autoregressive models and variational autoencoders, but not for general types of flow models. To fill in this gap, we introduce local bits-back coding, a new compression technique for flow models. We present efficient algorithms that instantiate our technique for many popular types of flows, and we demonstrate that our algorithms closely achieve theoretical codelengths for state-of-the-art flow models on high-dimensional data.

PDF Details

ICML Conference 2019 Conference Paper

Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design

Jonathan Ho
Xi Chen 0022
Aravind Srinivas
Yan Duan
Pieter Abbeel

Flow-based generative models are powerful exact likelihood models with efficient sampling and inference. Despite their computational efficiency, flow-based models generally have much worse density modeling performance compared to state-of-the-art autoregressive models. In this paper, we investigate and improve upon three limiting design choices employed by flow-based models in prior work: the use of uniform noise for dequantization, the use of inexpressive affine flows, and the use of purely convolutional conditioning networks in coupling layers. Based on our findings, we propose Flow++, a new flow-based model that is now the state-of-the-art non-autoregressive model for unconditional density estimation on standard image benchmarks. Our work has begun to close the significant performance gap that has so far existed between autoregressive models and flow-based models.

Details

NeurIPS Conference 2016 Conference Paper

Generative Adversarial Imitation Learning

Jonathan Ho
Stefano Ermon

Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

PDF Details

ICML Conference 2016 Conference Paper

Model-Free Imitation Learning with Policy Optimization

Jonathan Ho
Jayesh K. Gupta
Stefano Ermon

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.

Details

ICRA Conference 2013 Conference Paper

Tracking deformable objects with point clouds

John Schulman
Alex X. Lee
Jonathan Ho
Pieter Abbeel

We introduce an algorithm for tracking deformable objects from a sequence of point clouds. The proposed tracking algorithm is based on a probabilistic generative model that incorporates observations of the point cloud and the physical properties of the tracked object and its environment. We propose a modified expectation maximization algorithm to perform maximum a posteriori estimation to update the state estimate at each time step. Our modification makes it practical to perform the inference through calls to a physics simulation engine. This is significant because (i) it allows for the use of highly optimized physics simulation engines for the core computations of our tracking algorithm, and (ii) it makes it possible to naturally, and efficiently, account for physical constraints imposed by collisions, grasping actions, and material properties in the observation updates. Even in the presence of the relatively large occlusions that occur during manipulation tasks, our algorithm is able to robustly track a variety of types of deformable objects, including ones that are one-dimensional, such as ropes; two-dimensional, such as cloth; and three-dimensional, such as sponges. Our implementation can track these objects in real time.

Details