Arrow Research search

Author name cluster

Mingi Kwon

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

NeurIPS Conference 2025 Conference Paper

Balanced Conic Rectified Flow

  • Kim Shin seong
  • Mingi Kwon
  • Jaeseok Jeong
  • Youngjung Uh

Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE). The model learns a straight ODE by reflow steps which iteratively update the supervisory flow. It allows for a relatively simple and efficient generation of high-quality images. However, rectified flow still faces several challenges. 1) The reflow process is slow because it requires a large number of generated pairs to model the target distribution. 2) It is well known that the use of suboptimal fake samples in reflow can lead to performance degradation of the learned flow model. This issue is further exacerbated by error accumulation across reflow steps and model collapse in denoising autoencoder models caused by self-consuming training. In this work, we go one step further and empirically demonstrate that the reflow process causes the learned model to drift away from the target distribution, which in turn leads to a growing discrepancy in reconstruction error between fake and real images. We reveal the drift problem and design a new reflow step, namely the conic reflow. It supervises the model by the inversions of real data points through the previously learned model and its interpolation with random initial points. Our conic reflow leads to multiple advantages. 1) It keeps the ODE paths toward real samples, evaluated by reconstruction. 2) We use only a small number of generated samples instead of large generated samples, 600K and 4M, respectively. 3) The learned model generates images with higher quality evaluated by FID, IS, and Recall. 4) The learned flow is more straight than others, evaluated by curvature. We achieve much lower FID in both one-step and full-step generation in CIFAR-10. The conic reflow generalizes to various datasets such as LSUN Bedroom and ImageNet.

ICML Conference 2024 Conference Paper

Attribute Based Interpretable Evaluation Metrics for Generative Models

  • Dongkyun Kim
  • Mingi Kwon
  • Youngjung Uh

When the training dataset comprises a 1: 1 proportion of dogs to cats, a generative model that produces 1: 1 dogs and cats better resembles the training species distribution than another model with 3: 1 dogs and cats. Can we capture this phenomenon using existing metrics? Unfortunately, we cannot, because these metrics do not provide any interpretability beyond “diversity". In this context, we propose a new evaluation protocol that measures the divergence of a set of generated images from the training set regarding the distribution of attribute strengths as follows. Singleattribute Divergence (SaD) reveals the attributes that are generated excessively or insufficiently by measuring the divergence of PDFs of individual attributes. Paired-attribute Divergence (PaD) reveals such pairs of attributes by measuring the divergence of joint PDFs of pairs of attributes. For measuring the attribute strengths of an image, we propose Heterogeneous CLIPScore (HCS) which measures the cosine similarity between image and text vectors with heterogeneous initial points. With SaD and PaD, we reveal the following about existing generative models. ProjectedGAN generates implausible attribute relationships such as baby with beard even though it has competitive scores of existing metrics. Diffusion models struggle to capture diverse colors in the datasets. The larger sampling timesteps of the latent diffusion model generate the more minor objects including earrings and necklace. Stable Diffusion v1. 5 better captures the attributes than v2. 1. Our metrics lay a foundation for explainable evaluations of generative models.

ICLR Conference 2023 Conference Paper

Diffusion Models Already Have A Semantic Latent Space

  • Mingi Kwon
  • Jaeseok Jeong 0002
  • Youngjung Uh

Diffusion models achieve outstanding generative performance in various domains. Despite their great success, they lack semantic latent space which is essential for controlling the generative process. To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation: homogeneity, linearity, robustness, and consistency across timesteps. In addition, we measure editing strength and quality deficiency of a generative process at timesteps to provide a principled design of the process for versatility and quality improvements. Our method is applicable to various architectures (DDPM++, iDDPM, and ADM) and datasets (CelebA-HQ, AFHQ-dog, LSUN-church, LSUN-bedroom, and METFACES).

NeurIPS Conference 2023 Conference Paper

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

  • Yong-Hyun Park
  • Mingi Kwon
  • Jaewoong Choi
  • Junghyo Jo
  • Youngjung Uh

Despite the success of diffusion models (DMs), we still lack a thorough understanding of their latent space. To understand the latent space $\mathbf{x}_t \in \mathcal{X}$, we analyze them from a geometrical perspective. Our approach involves deriving the local latent basis within $\mathcal{X}$ by leveraging the pullback metric associated with their encoding feature maps. Remarkably, our discovered local latent basis enables image editing capabilities by moving $\mathbf{x}_t$, the latent space of DMs, along the basis vector at specific timesteps. We further analyze how the geometric structure of DMs evolves over diffusion timesteps and differs across different text conditions. This confirms the known phenomenon of coarse-to-fine generation, as well as reveals novel insights such as the discrepancy between $\mathbf{x}_t$ across timesteps, the effect of dataset complexity, and the time-varying influence of text prompts. To the best of our knowledge, this paper is the first to present image editing through $\mathbf{x}$-space traversal, editing only once at specific timestep $t$ without any additional training, and providing thorough analyses of the latent structure of DMs. The code to reproduce our experiments can be found at the [link](https: //github. com/enkeejunior1/Diffusion-Pullback).