Policy Improvement using Language Feedback Models

Victor Zhong; Dipendra Misra; Xingdi Yuan; Marc-Alexandre Côté

doi:10.52202/079017-1387

Back to NeurIPS

NeurIPS 2024

Policy Improvement using Language Feedback Models

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details DOI

Abstract

We introduce Language Feedback Models (LFMs) that identify desirable behaviour --- actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3. 5-12. 0% through one round of adaptation. Finally, LFMs can be modified to provide human-interpretable feedback without performance loss, allowing human verification of desirable behaviour for imitation learning.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Annual Conference on Neural Information Processing Systems
Archive span: 1987-2025
Indexed papers: 30776
Paper id: 747104698146163378