Arrow Research search
Back to ICRA

ICRA 2025

Improving Zero-Shot ObjectNav with Generative Communication

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Abstract

We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (−13% in OSR and −13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance scheme where the ground agent retains its independent exploratory behaviour shows a 10% OSR and 7. 65% SPL improvement. To explain navigation performance, we analyze the GC for unique traits, quantifying the presence of hallucination and cooperation. Specifically, we identify the novel linguistic trait of preemptive hallucination in our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move, and note its strong correlation with navigation performance. We conduct real-world experiments and present some qualitative examples where we mitigate hallucinations via prompt finetuning to improve ObjectNav performance.

Authors

Keywords

  • Correlation
  • Translation
  • Navigation
  • Linguistics
  • Lead
  • Benchmark testing
  • Robotics and automation
  • Selection Strategy
  • Hallucinations
  • Target Object
  • Global View
  • Real-world Experiments
  • Cooperative Strategy
  • Navigation Performance
  • Vocabulary
  • Natural Language
  • Simulation Environment
  • Unmanned Aerial Vehicles
  • Language Model
  • Cooperative Action
  • Multiple Agents
  • Light Reflection
  • Effective Collaboration
  • Navigation Strategies
  • View Of Environment
  • Cooperation Rate
  • Overhead Camera
  • Emergency Communication
  • Pre-emptive Action
  • Human-robot Collaboration

Context

Venue
IEEE International Conference on Robotics and Automation
Archive span
1984-2025
Indexed papers
30179
Paper id
349336280726030741