Object-Centric Latent Action Learning

Albina Klepach; Alexander Nikulin; Ilya Zisman; Denis Tarasov; Alexander Derevyagin; Andrei Polubarov; Nikita Lyubaykin; Igor Kiselev; Vladislav Kurenkov

doi:10.1609/aaai.v40i27.39423

Back to AAAI

AAAI 2026

Object-Centric Latent Action Learning

Conference Paper AAAI Technical Track on Machine Learning IV Artificial Intelligence

PDF Details DOI

Abstract

Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle the movement of the agent and distracting background dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our method in eight visually complex tasks across the Distracting Control Suite (DCS) and Distracting MetaWorld (DMW). Our results show that object-centric pretraining mitigates the negative effects of distractors by 50%, as measured by downstream task performance: average return (DCS) and success rate (DMW).

Object-Centric Latent Action Learning

Abstract

Authors

Keywords

Context