Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

Ancong Wu; Wei-Shi Zheng

doi:10.1609/aaai.v38i6.28407

Back to AAAI

AAAI 2024

Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

Conference Paper AAAI Technical Track on Computer Vision V Artificial Intelligence

PDF Details DOI

Abstract

Unsupervised disentangled representation learning aims to recover semantically meaningful factors from real-world data without supervision, which is significant for model generalization and interpretability. Current methods mainly rely on assumptions of independence or informativeness of factors, regardless of interpretability. Intuitively, visually interpretable concepts better align with human-defined factors. However, exploiting visual interpretability as inductive bias is still under-explored. Inspired by the observation that most explanatory image factors can be represented by ``content + mask'', we propose a content-mask factorization network (CMFNet) to decompose an image into different groups of content codes and masks, which are further combined as content masks to represent different visual concepts. To ensure informativeness of the representations, the CMFNet is jointly learned with a generator conditioned on the content masks for reconstructing the input image. The conditional generator employs a diffusion model to leverage its robust distribution modeling capability. Our model is called the Factorized Diffusion Autoencoder (FDAE). To enhance disentanglement of visual concepts, we propose a content decorrelation loss and a mask entropy loss to decorrelate content masks in latent space and spatial space, respectively. Experiments on Shapes3d, MPI3D and Cars3d show that our method achieves advanced performance and can generate visually interpretable concept-specific masks. Source code and supplementary materials are available at https://github.com/wuancong/FDAE.

Factorized Diffusion Autoencoder for Unsupervised Disentangled Representation Learning

Abstract

Authors

Keywords

Context