Author name cluster

Diego Romeres

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

ICRA Conference 2025 Conference Paper

FDPP: Fine-Tune Diffusion Policy with Human Preference

Yuxin Chen
Devesh K. Jha
Masayoshi Tomizuka
Diego Romeres

Imitation learning from human demonstrations enables robots to perform complex manipulation tasks and has recently witnessed huge success. However, these techniques often struggle to adapt behavior to new preferences or changes in the environment. To address these limitations, we propose Fine-tuning Diffusion Policy with Human Preference (FDPP). FDPP learns a reward function through preference-based learning. This reward is then used to fine-tune the pre-trained policy with reinforcement learning (RL), resulting in alignment of pre-trained policy with new human preferences while still solving the original task. Our experiments across various robotic tasks and preferences demonstrate that FDPP effectively customizes policy behavior without compromising performance. Additionally, we show that incorporating Kullback-Leibler (KL) regularization during fine-tuning prevents over-fitting and helps maintain the competencies of the initial policy.

ICRA Conference 2025 Conference Paper

PACE: Proactive Assistance in Human-Robot Collaboration Through Action-Completion Estimation

Davide De Lazzari
Matteo Terreran
Giulio Giacomuzzo
Siddarth Jain
Pietro Falco
Ruggero Carli
Diego Romeres

This paper introduces the Proactive Assistance through action-Completion Estimation (PACE) framework, designed to enhance human-robot collaboration through real-time monitoring of human progress. PACE incorporates a novel method that combines Dynamic Time Warping (DTW) with correlation analysis to track human task progression from hand movements. PACE trains a reinforcement learning policy from limited demonstrations to generate a proactive assistance policy that synchronizes robotic actions with human activities, minimizing idle time and enhancing collaboration efficiency. We validate the framework through user studies involving 12 participants, showing significant improvements in interaction fluency, reduced waiting times, and positive user feedback compared to traditional methods.

IROS Conference 2025 Conference Paper

RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation

Shivam Vats
Devesh K. Jha
Maxim Likhachev
Oliver Kroemer
Diego Romeres

Model-based planners and controllers are commonly used to solve complex manipulation problems as they can efficiently optimize diverse objectives and generalize to long horizon tasks. However, they often fail during deployment due to noisy actuation, partial observability and imperfect models. To enable a robot to recover from such failures, we propose to use hierarchical reinforcement learning to learn a recovery policy. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task using the nominal model-based controllers. Our approach, called RecoveryChaining, uses a hybrid action space, where the model-based controllers are provided as additional nominal options which allows the recovery policy to decide how to recover, when to switch to a nominal controller and which controller to switch to even with sparse rewards. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines. We successfully transfer recovery policies learned in simulation to a physical robot to demonstrate the feasibility of sim-to-real transfer with our method.

AAAI Conference 2025 Conference Paper

User Preference Meets Pareto-Optimality in Multi-Objective Bayesian Optimization

Joshua Hang Sai Ip
Ankush Chakrabarty
Ali Mesbah
Diego Romeres

Incorporating user preferences into multi-objective Bayesian optimization (MOBO) allows for personalization of the op- timization procedure. Preferences are often abstracted in the form of an unknown utility function, estimated through pair- wise comparisons of potential outcomes. However, utility-driven MOBO methods can yield solutions that are dominated by nearby solutions, as non-dominance is not enforced. Additionally, classical MOBO commonly relies on estimating the entire Pareto front to identify the Pareto-optimal solutions, which can be expensive and ignore user preferences. Here, we present a new method, termed preference-utility-balanced MOBO (PUB-MOBO), that allows users to disambiguate between near-Pareto candidate solutions. PUB-MOBO combines utility-based MOBO with local multi-gradient descent to refine user-preferred solutions to be near-Pareto-optimal. To this end, we propose a novel preference-dominated utility function that concurrently preserves user-preferences and dominance amongst candidate solutions. A key advantage of PUB-MOBO is that the local search is restricted to a (small) region of the Pareto front directed by user preferences, alleviating the need to estimate the entire Pareto-front. PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. PUB-MOBO consistently outperforms state-of-the-art competitors in terms of proximity to the Pareto-front and utility regret across all the problems.

PDF Details DOI

IROS Conference 2024 Conference Paper

Autonomous Robotic Assembly: From Part Singulation to Precise Assembly

Kei Ota
Devesh K. Jha
Siddarth Jain
Bill Yerazunis
Radu Corcodel
Yash Shukla
Antonia Bronars
Diego Romeres

Imagine a robot that can assemble a functional product from the individual parts presented in any configuration to the robot. Designing such a robotic system is a complex problem which presents several open challenges. To bypass these challenges, the current generation of assembly systems is built with a lot of system integration effort to provide the structure and precision necessary for assembly. These systems are mostly responsible for part singulation, part kitting, and part detection, which is accomplished by intelligent system design. In this paper, we present autonomous assembly of a gear box with minimum requirements on structure. The assembly parts are randomly placed in a two-dimensional work environment for the robot. The proposed system makes use of several different manipulation skills such as sliding for grasping, in-hand manipulation, and insertion to assemble the gear box. All these tasks are run in a closed-loop fashion using vision, tactile, and Force-Torque (F/T) sensors. We perform extensive hardware experiments to show the robustness of the proposed methods as well as the overall system. See supplementary video at https://www.youtube.com/watch?v=cZ9M1DQ23OI.

IROS Conference 2024 Conference Paper

DECAF: a Discrete-Event based Collaborative Human-Robot Framework for Furniture Assembly

Giulio Giacomuzzo
Matteo Terreran
Siddarth Jain
Diego Romeres

This paper proposes a task planning framework for collaborative Human-Robot scenarios, specifically focused on assembling complex systems such as furniture. The human is characterized as an uncontrollable agent, implying for example that the agent is not bound by a pre-established sequence of actions and instead acts according to its own preferences. Meanwhile, the task planner computes reactively the optimal actions for the collaborative robot to efficiently complete the entire assembly task in the least time possible. We formalize the problem as a Discrete Event Markov Decision Problem (DE-MDP), a comprehensive framework that incorporates a variety of asynchronous behaviors, human change of mind, and failure recovery as stochastic events. Although the problem could theoretically be addressed by constructing a graph of all possible actions, such an approach would be constrained by computational limitations. The proposed formulation offers an alternative solution utilizing Reinforcement Learning to derive an optimal policy for the robot. Experiments were conducted both in simulation and on a real system with human subjects assembling a chair in collaboration with a 7-DoF manipulator.

ICRA Conference 2024 Conference Paper

Interactive Planning Using Large Language Models for Partially Observable Robotic Tasks

Lingfeng Sun
Devesh K. Jha
Chiori Hori
Siddarth Jain
Radu Corcodel
Xinghao Zhu
Masayoshi Tomizuka
Diego Romeres

Designing robotic agents to perform open vocabulary tasks has been the long-standing goal in robotics and AI. Recently, Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. However, planning for these tasks in the presence of uncertainties is challenging as it requires "chain-of-thought" reasoning, aggregating information from the environment, updating state estimates, and generating actions based on the updated state estimates. In this paper, we present an interactive planning technique for partially observable tasks using LLMs. In the proposed method, an LLM is used to collect missing information from the environment using a robot, and infer the state of the underlying problem from collected observations while guiding the robot to perform the required actions. We also use a fine-tuned Llama 2 model via self-instruct and compare its performance against a pre-trained LLM like GPT-4. Results are demonstrated on several tasks in simulation as well as real-world environments.

ICRA Conference 2024 Conference Paper

Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

Xinghao Zhu
Devesh K. Jha
Diego Romeres
Lingfeng Sun
Masayoshi Tomizuka
Anoop Cherian

Automating the assembly of objects from their parts is a complex problem with innumerable applications in manufacturing, maintenance, and recycling. Unlike existing research, which is limited to target segmentation, pose regression, or using fixed target blueprints, our work presents a holistic multi-level framework for part assembly planning consisting of part assembly sequence inference, part motion planning, and robot contact optimization. We present the Part Assembly Sequence Transformer (PAST) – a sequence-to-sequence neural network – to infer assembly sequences recursively from a target blueprint. We then use a motion planner and optimization to generate part movements and contacts. To train PAST, we introduce D4PAS: a large-scale Dataset for Part Assembly Sequences consisting of physically valid sequences for industrial objects. Experimental results show that our approach generalizes better than prior methods while needing significantly less computational time for inference. Further details on our experiments and results are available in the video.

IROS Conference 2024 Conference Paper

Open Human-Robot Collaboration using Decentralized Inverse Reinforcement Learning

Prasanth Sengadu Suresh
Siddarth Jain
Prashant Doshi
Diego Romeres

The growing interest in human-robot collaboration (HRC), where humans and robots cooperate towards shared goals, has seen significant advancements over the past decade. While previous research has addressed various challenges, several key issues remain unresolved. Many domains within HRC involve activities that do not necessarily require human presence throughout the entire task. Existing literature typically models HRC as a closed system, where all agents are present for the entire duration of the task. In contrast, an open model offers flexibility by allowing an agent to enter and exit the collaboration as needed, enabling them to concurrently manage other tasks. In this paper, we introduce a novel multiagent framework called oDec-MDP, designed specifically to model open HRC scenarios where agents can join or leave tasks flexibly during execution. We generalize a recent multiagent inverse reinforcement learning method - Dec-AIRL to learn from open systems modeled using the oDec-MDP. Our method is validated through experiments conducted in both a simplified toy firefighting domain and a realistic dyadic human-robot collaborative assembly. Results show that our framework and learning method improves upon its closed system counterpart.

IJCAI Conference 2024 Conference Paper

Reinforcement Learning for Athletic Intelligence: Lessons from the 1st “AI Olympics with RealAIGym” Competition

Felix Wiebe
Niccolò Turcato
Alberto Dalla Libera
Chi Zhang
Theo Vincent
Shubham Vyas
Giulio Giacomuzzo
Ruggero Carli

As artificial intelligence gains new capabilities, it becomes important to evaluate it on real-world tasks. In particular, the fields of robotics and reinforcement learning (RL) are lacking in standardized benchmarking tasks on real hardware. To facilitate reproducibility and stimulate algorithmic advancements, we held an AI Olympics competition at IJCAI 2023 conference based on the double pendulum system in the RealAIGym project where the participants were asked to develop a controller for the swing up and stabilization task. This paper presents the methods and results from the top participating teams and provides insights into the real-world performance of RL algorithms with respect to a baseline time-varying LQR controller.

PDF Details DOI

IROS Conference 2023 Conference Paper

Constrained Dynamic Movement Primitives for Collision Avoidance in Novel Environments

Seiji Shaw
Devesh K. Jha
Arvind U. Raghunathan
Radu Corcodel
Diego Romeres
George Konidaris 0001
Daniel Nikovski

Dynamic movement primitives are widely used for learning skills that can be demonstrated to a robot by a skilled human or controller. While their generalization capabilities and simple formulation make them very appealing to use, they possess no strong guarantees to satisfy operational safety constraints for a task. We present constrained dynamic movement primitives (CDMPs), which can allow for positional constraint satisfaction in the robot workspace. Our method solves a non-linear optimization to perturb an existing DMP's forcing weights to admit a Zeroing Barrier Function (ZBF), which certifies positional workspace constraint satisfaction. We demonstrate our approach under different positional constraints on the end-effector movement on multiple physical robots, such as obstacle avoidance and workspace limitations.

EAAI Journal 2023 Journal Article

Introduction to the special issue on Intelligent Control and Optimisation

Seán McLoone
Kevin Guelton
Thierry Guerra
Gian Antonio Susto
Juš Kocijan
Diego Romeres

ICRA Conference 2023 Conference Paper

Learning Generalizable Pivoting Skills

Xiang Zhang 0020
Siddarth Jain
Baichuan Huang
Masayoshi Tomizuka
Diego Romeres

The skill of pivoting an object with a robotic system is challenging for the external forces that act on the system, mainly given by contact interaction. The complexity increases when the same skills are required to generalize across different objects. This paper proposes a framework for learning robust and generalizable pivoting skills, which consists of three steps. First, we learn a pivoting policy on an “unitary” object using Reinforcement Learning (RL). Then, we obtain the object's feature space by supervised learning to encode the kinematic properties of arbitrary objects. Finally, to adapt the unitary policy to multiple objects, we learn data-driven projections based on the object features to adjust the state and action space of the new pivoting task. The proposed approach is entirely trained in simulation. It requires only one depth image of the object and can zero-shot transfer to real-world objects. We demonstrate robustness to sim-to-real transfer and generalization to multiple objects.

ICRA Conference 2023 Conference Paper

Simultaneous Tactile Estimation and Control of Extrinsic Contact

Sangwoon Kim
Devesh K. Jha
Diego Romeres
Parag Patre
Alberto Rodriguez 0003

We propose a method that simultaneously estimates and controls extrinsic contact with tactile feedback. The method enables challenging manipulation tasks that require controlling light forces and accurate motions in contact, such as balancing an unknown object on a thin rod standing upright. A factor graph-based framework fuses a sequence of tactile and kinematic measurements to estimate and control the interaction between gripper-object-environment, including the location and wrench at the extrinsic contact between the grasped object and the environment and the grasp wrench transferred from the gripper to the object. The same framework simultaneously plans the gripper motions that make it possible to estimate the state while satisfying regularizing control objectives to prevent slip, such as minimizing the grasp wrench and minimizing frictional force at the extrinsic contact. We show results with sub-millimeter contact localization error and good slip prevention even on slippery environments, for multiple contact formations (point, line, patch contact) and transitions between them. See supplementary video and results at https://sites.google.com/view/sim-tact.

IROS Conference 2022 Conference Paper

Active Exploration for Robotic Manipulation

Tim Schneider
Boris Belousov
Georgia Chalvatzaki
Diego Romeres
Devesh K. Jha
Jan Peters 0001

Robotic manipulation stands as a largely unsolved problem despite significant advances in robotics and machine learning in recent years. One of the key challenges in manipulation is the exploration of the dynamics of the environment when there is continuous contact between the objects being manipulated. This paper proposes a model-based active exploration approach that enables efficient learning in sparse-reward robotic manipulation tasks. The proposed method estimates an information gain objective using an ensemble of probabilistic models and deploys model predictive control (MPC) to plan actions online that maximize the expected reward while also performing directed exploration. We evaluate our proposed algorithm in simulation and on a real robot, trained from scratch with our method, on a challenging ball pushing task on tilted tables, where the target ball position is not known to the agent a-priori. Our real-world robot experiment serves as a fundamental application of active exploration in model-based reinforcement learning of complex robotic manipulation tasks. Project page https://sites.google.com/view/aerm.

ICRA Conference 2022 Conference Paper

PyROBOCOP: Python-based Robotic Control & Optimization Package for Manipulation

Arvind U. Raghunathan
Devesh K. Jha
Diego Romeres

PyROBOCOP is a Python-based package for control, optimization and estimation of robotic systems described by nonlinear Differential Algebraic Equations (DAEs). In particular, the package can handle systems with contacts that are described by complementarity constraints and provides a general framework for specifying obstacle avoidance constraints. The package performs direct transcription of the DAEs into a set of nonlinear equations by performing orthogonal collocation on finite elements. PyROBOCOP provides automatic reformulation of the complementarity constraints that are tractable to NLP solvers to perform optimization of robotic systems. The package is interfaced with ADOL-C [1] for obtaining sparse derivatives by automatic differentiation and IPOPT [2] for performing optimization. We evaluate PyROBOCOP on several manipulation problems for control and estimation.

ICRA Conference 2022 Conference Paper

Robust Pivoting: Exploiting Frictional Stability Using Bilevel Optimization

Yuki Shirai
Devesh K. Jha
Arvind U. Raghunathan
Diego Romeres

Generalizable manipulation requires that robots be able to interact with novel objects and environment. This requirement makes manipulation extremely challenging as a robot has to reason about complex frictional interaction with uncertainty in physical properties of the object. In this paper, we study robust optimization for control of pivoting manipulation in the presence of uncertainties. We present insights about how friction can be exploited to compensate for the inaccuracies in the estimates of the physical properties during manipulation. In particular, we derive analytical expressions for stability margin provided by friction during pivoting manipulation. This margin is then used in a bilevel trajectory optimization algorithm to design a controller that maximizes this stability margin to provide robustness against uncertainty in physical properties of the object. We demonstrate our proposed method using a 6 DoF manipulator for manipulating several different objects.

ICRA Conference 2021 Conference Paper

Tactile-RL for Insertion: Generalization to Objects of Unknown Geometry

Siyuan Dong
Devesh K. Jha
Diego Romeres
Sangwoon Kim
Daniel Nikovski
Alberto Rodriguez 0003

Object insertion is a classic contact-rich manipulation task. The task remains challenging, especially when considering general objects of unknown geometry, which significantly limits the ability to understand the contact configuration between the object and the environment. We study the problem of aligning the object and environment with a tactile-based feedback insertion policy. The insertion process is modeled as an episodic policy that iterates between insertion attempts followed by pose corrections. We explore different mechanisms to learn such a policy based on Reinforcement Learning. The key contribution of this paper is to demonstrate that it is possible to learn a tactile insertion policy that generalizes across different object geometries, and an ablation study of the key design choices for the learning agent: 1) the type of learning scheme: supervised vs. reinforcement learning; 2) the type of learning schedule: unguided vs. curriculum learning; 3) the type of sensing modality: force/torque vs. tactile; and 4) the type of tactile representation: tactile RGB vs. tactile flow. We show that the optimal configuration of the learning agent (RL + curriculum + tactile flow) exposed to 4 training objects yields an closed-loop insertion policy that inserts 4 novel objects with over 85. 0% success rate and within 3~4 consecutive attempts. Comparisons between F/T and tactile sensing, shows that while an F/T-based policy learns more efficiently, a tactile-based policy provides better generalization. See supplementary video and results at https://sites.google.com/view/tactileinsertion.

ICRA Conference 2021 Conference Paper

Trajectory Optimization for Manipulation of Deformable Objects: Assembly of Belt Drive Units

Shiyu Jin
Diego Romeres
Arvind Ragunathan
Devesh K. Jha
Masayoshi Tomizuka

This paper presents a novel trajectory optimization formulation to solve the robotic assembly of the belt drive unit. Robotic manipulations involving contacts and deformable objects are challenging in both dynamic modeling and trajectory planning. For modeling, variations in the belt tension and contact forces between the belt and the pulley could dramatically change the system dynamics. For trajectory planning, it is computationally expensive to plan trajectories for such hybrid dynamical systems as it usually requires planning for discrete modes separately. In this work, we formulate the belt drive unit assembly task as a trajectory optimization problem with complementarity constraints to avoid explicitly imposing contact mode sequences. The problem is solved as a mathematical program with complementarity constraints (MPCC) to obtain feasible and efficient assembly trajectories. We validate the proposed method both in simulations with a physics engine and in real-world experiments with a robotic manipulator.

ICRA Conference 2020 Conference Paper

Local Policy Optimization for Trajectory-Centric Reinforcement Learning

Patrik Kolaric
Devesh K. Jha
Arvind U. Raghunathan
Frank L. Lewis
Mouhacine Benosman
Diego Romeres
Daniel Nikovski

The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions.

ICRA Conference 2019 Conference Paper

Semiparametrical Gaussian Processes Learning of Forward Dynamical Models for Navigating in a Circular Maze

Diego Romeres
Devesh K. Jha
Alberto Dalla Libera
Bill Yerazunis
Daniel Nikovski

This paper presents a problem of model learning for the purpose of learning how to navigate a ball to a goal state in a circular maze environment with two degrees of freedom. The motion of the ball in the maze environment is influenced by several non-linear effects such as dry friction and contacts, which are difficult to model physically. We propose a semiparametric model to estimate the motion dynamics of the ball based on Gaussian Process Regression equipped with basis functions obtained from physics first principles. The accuracy of this semiparametric model is shown not only in estimation but also in prediction at n-steps ahead and its compared with standard algorithms for model learning. The learned model is then used in a trajectory optimization algorithm to compute ball trajectories. We propose the system presented in the paper as a benchmark problem for reinforcement and robot learning, for its interesting and challenging dynamics and its relative ease of reproducibility.

ICRA Conference 2019 Conference Paper

Sim-to-Real Transfer Learning using Robustified Controllers in Robotic Tasks involving Complex Dynamics

Jeroen van Baar
Alan Sullivan
Radu Cordorel
Devesh K. Jha
Diego Romeres
Daniel Nikovski

Learning robot tasks or controllers using deep reinforcement learning has been proven effective in simulations. Learning in simulation has several advantages. For example, one can fully control the simulated environment, including halting motions while performing computations. Another advantage when robots are involved, is that the amount of time a robot is occupied learning a task-rather than being productive-can be reduced by transferring the learned task to the real robot. Transfer learning requires some amount of fine-tuning on the real robot. For tasks which involve complex (non-linear) dynamics, the fine-tuning itself may take a substantial amount of time. In order to reduce the amount of fine-tuning we propose to learn robustified controllers in simulation. Robustified controllers are learned by exploiting the ability to change simulation parameters (both appearance and dynamics) for successive training episodes. An additional benefit for this approach is that it alleviates the precise determination of physics parameters for the simulator, which is a non-trivial task. We demonstrate our proposed approach on a real setup in which a robot aims to solve a maze game, which involves complex dynamics due to static friction and potentially large accelerations. We show that the amount of fine-tuning in transfer learning for a robustified controller is substantially reduced compared to a non-robustified controller.