Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Liang Peng; Boxi Wu; Haoran Cheng; Yibo Zhao; Xiaofei He

Back to NeurIPS

NeurIPS 2025

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Direct preference optimization (DPO) is an effective method for aligning generative models with human preferences and has been successfully applied to fine‑tune text‑to‑image diffusion models. Its practical adoption, however, is hindered by a labor‑intensive pipeline that first produces a large set of candidate images and then requires humans to rank them pairwise. We address this bottleneck with self‑supervised direct preference optimization, a new paradigm that removes the need for any pre‑generated images or manual ranking. During training, we create preference pairs on the fly through self‑supervised image transformations, allowing the model to learn from fresh and diverse comparisons at every iteration. This online strategy eliminates costly data collection and annotation while remaining plug‑and‑play for any text‑to‑image diffusion method. Surprisingly, the on‑the‑fly pairs produced by the proposed method not only match but exceed the effectiveness of conventional DPO, which we attribute to the greater diversity of preferences sampled during training. Extensive experiments with Stable Diffusion 1. 5 and Stable Diffusion XL confirm that our method delivers substantial gains.

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Abstract

Authors

Keywords

Context