Author name cluster

Eric Rosen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

1 author row

ICRA Conference 2025 Conference Paper

Verifiably Following Complex Robot Instructions with Foundation Models

Benedict Quartey
Eric Rosen
Stefanie Tellex
George Konidaris 0001

When instructing robots, users want to flexibly express constraints, refer to arbitrary landmarks, and verify robot behavior, while robots must disambiguate instructions into specifications and ground instruction referents in the real world. To address this problem, we propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow complex, open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of correct-by-construction robot behaviors. We conduct a large-scale evaluation of LIMP on 150 instructions across five real-world environments, demonstrating its versatility and ease of deployment in diverse, unstructured domains. LIMP performs comparably to state-of-the-art baselines on standard open-vocabulary tasks and additionally achieves a 79% success rate on complex spatiotemporal instructions, significantly outperforming baselines that only reach 38%. 1 1 See supplementary materials and demo videos at robotlimp.github.io

Details

ICRA Conference 2024 Conference Paper

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Shreyas Sundara Raman
Vanya Cohen
Ifrah Idrees
Eric Rosen
Raymond Mooney
Stefanie Tellex
David Paulius

Extracting knowledge and reasoning from large language models (LLMs) offers a path to designing intelligent robots. Common approaches that leverage LLMs for planning are unable to recover when actions fail and resort to retrying failed actions without resolving the underlying cause. We propose a novel approach (CAPE) that generates corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans through few-shot reasoning on action preconditions. Our approach enables embodied agents to execute more tasks than baseline methods while maintaining semantic correctness and minimizing re-prompting. In VirtualHome, CAPE improves a human-annotated plan correctness metric from 28. 89% to 49. 63% over SayCan, whilst achieving competitive executability. Our improvements transfer to a Boston Dynamics Spot robot initialized with a set of skills (specified in language) and associated preconditions, where CAPE improves correctness by 76. 49% with higher executability compared to SayCan. Our approach enables embodied agents to follow natural language commands and robustly recover from failures.

Details

ICRA Conference 2024 Conference Paper

Composable Interaction Primitives: A Structured Policy Class for Efficiently Learning Sustained-Contact Manipulation Skills

Ben Abbatematteo
Eric Rosen
Skye Thompson
Tuluhan Akbulut
Sreehari Rammohan
George Konidaris 0001

We propose a new policy class, Composable Interaction Primitives (CIPs), specialized for learning sustained-contact manipulation skills like opening a drawer, pulling a lever, turning a wheel, or shifting gears. CIPs have two primary design goals: to minimize what must be learned by exploiting structure present in the world and the robot, and to support sequential composition by construction, so that learned skills can be used by a task-level planner. Using an ablation experiment in four simulated manipulation tasks, we show that the structure included in CIPs substantially improves the efficiency of motor skill learning. We then show that CIPs can be used for plan execution in a zero-shot fashion by sequencing learned skills. We validate our approach on real robot hardware by learning and sequencing two manipulation skills.

Details

ICRA Conference 2024 Conference Paper

Robot Task Planning Under Local Observability

Max Merlin
Shane Parr
Neev Parikh
Sergio Orozco
Vedant Gupta
Eric Rosen
George Konidaris 0001

Real-world robot task planning is intractable in part due to partial observability. A common approach to reducing complexity is introducing additional structure into the decision process, such as mixed-observability, factored states, or temporally-extended actions. We propose the locally observable Markov decision process, a novel formulation that models task-level planning where uncertainty pertains to object-level attributes and where a robot has subroutines for seeking and accurately observing objects. This models sensors that are range-limited and line-of-sight—objects occluded or outside sensor range are unobserved, but the attributes of objects that fall within sensor view can be resolved via repeated observation. Our model results in a three-stage planning process: first, the robot plans using only observed objects; if that fails, it generates a target object that, if observed, could result in a feasible plan; finally, it attempts to locate and observe the target, replanning after each newly observed object. By combining LOMDPs with off-the-shelf Markov planners, we outperform state-of-the-art-solvers for both object-oriented POMDP and MDP analogues with the same task specification. We then apply the formulation to successfully solve a task on a mobile robot.

Details

ICRA Conference 2024 Conference Paper

Skill Transfer for Temporal Task Specification

Jason Xinyu Liu
Ankit Shah
Eric Rosen
Mingxi Jia
George Konidaris 0001
Stefanie Tellex

Deploying robots in real-world environments, such as households and manufacturing lines, requires generalization across novel task specifications without violating safety constraints. Linear temporal logic (LTL) is a widely used task specification language with a compositional grammar that naturally induces commonalities among tasks while preserving safety guarantees. However, most prior work on reinforcement learning with LTL specifications treats every new task independently, thus requiring large amounts of training data to generalize. We propose LTL-Transfer, a zero-shot transfer algorithm that composes task-agnostic skills learned during training to safely satisfy a wide variety of novel LTL task specifications. Experiments in Minecraft-inspired domains show that after training on only 50 tasks, LTL-Transfer can solve over 90% of 100 challenging unseen tasks and 100% of 300 commonly used novel tasks without violating any safety constraints. We deployed LTL-Transfer at the task-planning level of a quadruped mobile manipulator to demonstrate its zero-shot transfer ability for fetch-and-deliver and navigation tasks.

Details

IROS Conference 2023 Conference Paper

Language-Conditioned Observation Models for Visual Object Search

Thao Nguyen
Vladislav Hrosinkov
Eric Rosen
Stefanie Tellex

Object search is a challenging task because when given complex language descriptions (e. g. , “find the white cup on the table”), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0. 46 to 0. 66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.

Details

IROS Conference 2021 Conference Paper

Bootstrapping Motor Skill Learning with Motion Planning

Ben Abbatematteo
Eric Rosen
Stefanie Tellex
George Konidaris 0001

Learning a robot motor skill from scratch is impractically slow; so much so that in practice, learning must typically be bootstrapped using human demonstration. However, relying on human demonstration necessarily degrades the autonomy of robots that must learn a wide variety of skills over their operational lifetimes. We propose using kinematic motion planning as a completely autonomous, sample efficient way to bootstrap motor skill learning for object manipulation. We demonstrate the use of motion planners to bootstrap motor skills in two complex object manipulation scenarios with different policy representations: opening a drawer with a dynamic movement primitive representation, and closing a microwave door with a deep neural network policy. We also show how our method can bootstrap a motor skill for the challenging dynamic task of learning to hit a ball off a tee, where a kinematic plan based on treating the scene as static is insufficient to solve the task, but sufficient to bootstrap a more dynamic policy. In all three cases, our method is competitive with human-demonstrated initialization, and significantly out-performs starting with a random policy. This approach enables robots to to efficiently and autonomously learn motor policies for dynamic tasks without human demonstration.

Details

IROS Conference 2020 Conference Paper

Building Plannable Representations with Mixed Reality

Eric Rosen
Nishanth Kumar
Nakul Gopalan
Daniel Ullman 0002
George Konidaris 0001
Stefanie Tellex

We propose Action-Oriented Semantic Maps (AOSMs), a representation that enables a robot to acquire object manipulation behaviors and semantic information about the environment from a human teacher with a Mixed Reality Head-Mounted Display (MR-HMD). AOSMs are a representation that captures both: a) high-level object manipulation actions in an object class's local frame, and b) semantic representations of objects in the robot's global map that are grounded for navigation. Humans can use a MR-HMD to teach the agent the information necessary for planning object manipulation and navigation actions by interacting with virtual 3D meshes overlaid on the physical workspace. We demonstrate that our system enables users to quickly and accurately teach a robot the knowledge required to autonomously plan and execute three household tasks: picking up a bottle and throwing it in the trash, closing a sink faucet, and flipping a light switch off.

Details

IROS Conference 2020 Conference Paper

Mixed Reality as a Bidirectional Communication Interface for Human-Robot Interaction

Eric Rosen
David Whitney
Michael Fishman 0001
Daniel Ullman 0002
Stefanie Tellex

We present a decision-theoretic model and robot system that interprets multimodal human communication to disambiguate item references by asking questions via a mixed reality (MR) interface. Existing approaches have either chosen to use physical behaviors, like pointing and eye gaze, or virtual behaviors, like mixed reality. However, there is a gap of research on how MR compares to physical actions for reducing robot uncertainty. We test the hypothesis that virtual deictic gestures are better for human-robot interaction (HRI) than physical behaviors. To test this hypothesis, we propose the Physio-Virtual Deixis Partially Observable Markov Decision Process (PVD-POMDP), which interprets multimodal observations (speech, eye gaze, and pointing gestures) from the human and decides when and how to ask questions (either via physical or virtual deictic gestures) in order to recover from failure states and cope with sensor noise. We conducted a between-subjects user study with 80 participants distributed across three conditions of robot communication: no feedback control, physical feedback, and MR feedback. We tested performance of each condition with objective measures (accuracy, time), as well as evaluated user experience with subjective measures (usability, trust, workload). We found the MR feedback condition was 10% more accurate than the physical condition and a speedup of 160%. We also found that the feedback conditions significantly outperformed the no feedback condition in all subjective metrics.

Details

ICRA Conference 2019 Conference Paper

End-User Robot Programming Using Mixed Reality

Samir Yitzhak Gadre
Eric Rosen
Gary Chien
Elizabeth Phillips
Stefanie Tellex
George Konidaris 0001

Mixed Reality (MR) is a promising interface for robot programming because it can project an immersive 3D visualization of a robot's intended movement onto the real world. MR can also support hand gestures, which provide an intuitive way for users to construct and modify robot motions. We present a Mixed Reality Head-Mounted Display (MRHMD) interface that enables end-users to easily create and edit robot motions using waypoints. We describe a user study where 20 participants were asked to program a robot arm using 2D and MR interfaces to perform two pick-and-place tasks. In the primitive task, participants created typical pickand-place programs. In the adapted task, participants adapted their primitive programs to address a more complex pickand-place scenario, which included obstacles and conditional reasoning. Compared to the 2D interface, a higher number of users were able to complete both tasks in significantly less time, and reported experiencing lower cognitive workload, higher usability, and higher naturalness with the MR-HMD interface.

Details

IROS Conference 2018 Conference Paper

ROS Reality: A Virtual Reality Framework Using Consumer-Grade Hardware for ROS-Enabled Robots

David Whitney
Eric Rosen
Daniel Ullman 0002
Elizabeth Phillips
Stefanie Tellex

Virtual reality (VR)systems let users intuitively interact with 3D environments and have been used extensively for robotic teleoperation tasks. While more immersive than their 2D counterparts, early VR systems were expensive and required specialized hardware. Fortunately, there has been a recent proliferation of consumer-grade VR systems at affordable price points. These systems are inexpensive, relatively portable, and can be integrated into existing robotic frameworks. Our group has designed a VR teleoperation package for the Robot Operating System (ROS), ROS Reality, that can be easily integrated into such frameworks. ROS Reality is an open-source, over-the-Internet teleoperation interface between any ROS-enabled robot and any Unity-compatible VR headset. We completed a pilot study to test the efficacy of our system, with expert human users controlling a Baxter robot via ROS Reality to complete 24 dexterous manipulation tasks, compared to the same users controlling the robot via direct kinesthetic handling. This study provides insight into the feasibility of robotic teleoperation tasks in VR with current consumer-grade resources and exposes issues that need to be addressed in these VR systems. In addition, this paper presents a description of ROS Reality, its components, and architecture. We hope this system will be adopted by other research groups to allow for easy integration of VR teleoperated robots into future experiments.

Details

ICRA Conference 2017 Conference Paper

Reducing errors in object-fetching interactions through social feedback

David Whitney
Eric Rosen
James MacGlashan
Lawson L. S. Wong
Stefanie Tellex

Fetching items is an important problem for a social robot. It requires a robot to interpret a person's language and gesture and use these noisy observations to infer what item to deliver. If the robot could ask questions, it would help the robot be faster and more accurate in its task. Existing approaches either do not ask questions, or rely on fixed question-asking policies. To address this problem, we propose a model that makes assumptions about cooperation between agents to perform richer signal extraction from observations. This work defines a mathematical framework for an item-fetching domain that allows a robot to increase the speed and accuracy of its ability to interpret a person's requests by reasoning about its own uncertainty as well as processing implicit information (implicatures). We formalize the item-delivery domain as a Partially Observable Markov Decision Process (POMDP), and approximately solve this POMDP in real time. Our model improves speed and accuracy of fetching tasks by asking relevant clarifying questions only when necessary. To measure our model's improvements, we conducted a real world user study with 16 participants. Our method achieved greater accuracy and a faster interaction time compared to state-of-the-art baselines. Our model is 2. 17 seconds faster (25% faster) than a state-of-the-art baseline, while being 2. 1% more accurate.

Details