Jack Urbanek Papers

TMLR Journal 2024 Journal Article

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Oscar Mañas
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Aishwarya Agrawal
Adriana Romero-Soriano

Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs.

PDF Details

AAAI Conference 2020 Conference Paper

Generating Interactive Worlds with Text

Angela Fan
Jack Urbanek
Pratik Ringshia
Emily Dinan
Emma Qian
Siddharth Karamcheti
Shrimai Prabhumoye
Douwe Kiela

Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multiplayer text adventure game environment LIGHT (Urbanek et al. 2019). We introduce neural network based models to compositionally arrange locations, characters, and objects into a coherent whole. In addition to creating worlds based on existing elements, our models can generate new game content. Humans can also leverage our models to interactively aid in worldbuilding. We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

PDF Details

Possible papers

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Generating Interactive Worlds with Text