Arrow Research search

Author name cluster

Binxu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

NeurIPS Conference 2025 Conference Paper

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

  • Binxu Wang
  • Cengiz Pehlevan

We develop an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one or two layer linear or linear convolutional denoiser settings with arbitrary data, where linear networks converge along principal components, and convolutional networks converge along Fourier modes. Remarkably, these solutions allow us to derive the generated distribution in closed-form and its KL-divergence through training. These analytical results expose a pronounced \emph{spectral bias}, i. e. for both weights and generated distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and natural image datasets demonstrate that the power-law spectral bias—remain robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations of why earlier stopping could lead to incorrect details in image generative model.

ICML Conference 2025 Conference Paper

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

  • Thomas Fel
  • Ekdeep Singh Lubana
  • Jacob S. Prince
  • Matthew Kowal
  • Victor Boutin
  • Isabel Papadimitriou
  • Binxu Wang
  • Martin Wattenberg

Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts. However, we reveal a fundamental limitation: SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictionaries, undermining their reliability as an interpretability tool. To address this issue, we draw inspiration from the Archetypal Analysis framework introduced by Cutler & Breiman (1994) and present Archetypal SAEs (A-SAE), wherein dictionary atoms are constrained to the data’s convex hull. This geometric anchoring significantly enhances the stability and plausibility of inferred dictionaries, and their mildly relaxed variants RA-SAEs further match state-of-the-art reconstruction abilities. To rigorously assess dictionary quality learned by SAEs, we introduce two new benchmarks that test (i) plausibility, if dictionaries recover “true” classification directions and (ii) identifiability, if dictionaries disentangle synthetic concept mixtures. Across all evaluations, RA-SAEs consistently yield more structured representations while uncovering novel, semantically meaningful concepts in large-scale vision models.

TMLR Journal 2024 Journal Article

The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

  • Binxu Wang
  • John Vastola

Diffusion models have achieved remarkable results in multiple domains of generative modeling. By learning the gradient of smoothed data distributions, they can iteratively generate samples from complex distributions, e.g., of natural images. The learned score function enables their generalization capabilities, but how the learned score relates to the score of the underlying data manifold remains largely unclear. Here, we aim to elucidate this relationship by comparing the learned scores of neural-network-based models to the scores of two kinds of analytically tractable distributions: Gaussians and Gaussian mixtures. The simplicity of the Gaussian model makes it particularly attractive from a theoretical point of view, and we show that it admits a closed-form solution and predicts many qualitative aspects of sample generation dynamics. We claim that the learned neural score is dominated by its linear (Gaussian) approximation for moderate to high noise scales, and supply both theoretical and empirical arguments to support this claim. Moreover, the Gaussian approximation empirically works for a larger range of noise scales than naive theory suggests it should, and is preferentially learned by networks early in training. At smaller noise scales, we observe that learned scores are better described by a coarse-grained (Gaussian mixture) approximation of training data than by the score of the training distribution, a finding consistent with generalization. Our findings enable us to precisely predict the initial phase of trained models' sampling trajectories through their Gaussian approximations. We show that this allows one to leverage the Gaussian analytical solution to skip the first 15-30\% of sampling steps while maintaining high sample quality (with a near state-of-the-art FID score of 1.93 on CIFAR-10 unconditional generation). This forms the foundation of a novel hybrid sampling method, termed \textit{analytical teleportation}, which can seamlessly integrate with and accelerate existing samplers, including DPM-Solver-v3 and UniPC. Our findings strengthen the field's theoretical understanding of how diffusion models work and suggest ways to improve the design and training of diffusion models.

ICLR Conference 2021 Conference Paper

A Geometric Analysis of Deep Generative Image Models and Its Applications

  • Binxu Wang
  • Carlos R. Ponce

Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs.