Author name cluster

Joseph G. Makin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

AAAI Conference 2025 Conference Paper

Exponential-Family Harmoniums with Neural Sufficient Statistics

Azwar Abdulsalam
Joseph G. Makin

Exponential-family harmoniums (EFHs) generalize the restricted Boltzmann machine beyond Bernoulli random variables to other exponential families. Here we show how to extend the EFH beyond standard exponential families (Poisson, Gaussian, etc.), by allowing the sufficient statistics for the hidden units to be arbitrary functions of the observed data, parameterized by deep neural networks. This rules out the standard sampling scheme, block Gibbs sampling, so we replace it with a form of Langevin dynamics within Gibbs, inspired by a recent method for training Gaussian restricted Boltzmann machines (GRBMs). With Gibbs-Langevin, the GRBM can successfully model small datasets like MNIST and CelebA-32, but struggles with CIFAR-10, and cannot scale to larger images because it lacks convolutions. In contrast, our neural-network EFHs (NN-EFHs) generate high-quality samples from CIFAR-10 and scale well to CelebA-HQ. On these datasets, the NN-EFH achieves FID scores that are 25--50% lower than a standard energy-based model with a similar neural-network architecture and the same number of parameters; and competitive with noise-conditional score networks, which utilize more complex neural networks (U-nets) and require considerably more sampling steps.

PDF Details DOI

TMLR Journal 2025 Journal Article

Revisiting Contrastive Divergence for Density Estimation and Sample Generation

Azwar Abdulsalam
Joseph G. Makin

Energy-based models (EBMs) have recently attracted renewed attention as models for complex distributions of data, like natural images. Improved image generation under the maximum-likelihood (MLE) objective has been achieved by combining very complex energy functions, in the form of deep neural networks, with Langevin dynamics for sampling from the model. However, Nijkamp and colleagues have recently shown that such EBMs become good generators without becoming good density estimators: an impractical number of Langevin steps is typically required to exit the burn-in of the Markov chain, so the training merely sculpts the energy landscape near the distribution used to initialize the chain. Careful hyperparameter choices and the use of persistent chains can significantly shorten the required number of Langevin steps, but at the price that new samples can be generated only in the vicinity of the persistent chain and not from noise. Here we introduce a simple method to achieve both convergence of the Markov chain in a practical number of Langevin steps (L = 500) and the ability to generate diverse, high-quality samples from noise. Under the hypothesis that Hinton’s classic contrastive-divergence (CD) training does yield good density estimators, but simply lacks a mechanism for connecting the noise manifold to the learned data manifold, we combine CD with an MLE-like loss. We demonstrate that a simple ConvNet can be trained with this method to be good at generation as well as density estimation for CIFAR-10, Oxford Flowers, and a synthetic dataset in which the learned density can be verified visually.

PDF Details