Preference Adaptive and Sequential Text-to-Image Generation

Ofir Nabati; Guy Tennenholtz; Chih-Wei Hsu; Moonkyung Ryu; Deepak Ramachandran; Yinlam Chow; Xiang Li; Craig Boutilier

Back to ICML

ICML 2025

Preference Adaptive and Sequential Text-to-Image Generation

Conference Paper Accept (poster) Artificial Intelligence · Machine Learning

Details

Abstract

We address the problem of interactive text-to-image (T2I) generation, designing a reinforcement learning (RL) agent which iteratively improves a set of generated images for a user through a sequence of prompt expansions. Using human raters, we create a novel dataset of sequential preferences, which we leverage, together with large-scale open-source (non-sequential) datasets. We construct user-preference and user-choice models using an EM strategy and identify varying user preference types. We then leverage a large multimodal language model (LMM) and a value-based RL approach to suggest an adaptive and diverse slate of prompt expansions to the user. Our Preference Adaptive and Sequential Text-to-image Agent (PASTA) extends T2I models with adaptive multi-turn capabilities, fostering collaborative co-creation and addressing uncertainty or underspecification in a user’s intent. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods. We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems.

Authors

Keywords

text-to-image generation
reinforcement learning
diffusion models
multi-modal large language models
human-in-the-loop
sequential decision making

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 1141045801818147496