Author name cluster

Jonathan Scholz

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers

2 author rows

ICRA Conference 2025 Conference Paper

DemoStart: Demonstration-Led Auto-Curriculum Applied to Sim-to-Real with Multi-Fingered Robots

Maria Bauzá 0001
Jose Enriaue Chen
Valentin Dalibard
Nimrod Gileadi
Roland Hafner
Murilo F. Martins
Joss Moore
Rugile Pevceviciute

We present DemoStart, a novel auto-curriculum reinforcement learning method capable of learning complex manipulation behaviors on an arm equipped with a three- fingered robotic hand, from only a sparse reward and a handful of demonstrations in simulation. Learning from simulation drastically reduces the development cycle of behavior generation, and domain randomization techniques are leveraged to achieve successful zero-shot sim-to- real transfer. Transferred policies are learned directly from raw pixels from multiple cameras and robot proprioception. Our approach outperforms policies learned from demonstrations on the real robot and requires 100 times fewer demonstrations, collected in simulation. More details and videos in sites.google.com/view/demostart.

Details

ICLR Conference 2024 Conference Paper

Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

Ben Eisner
Yi Yang 0007
Todor Davchev
Mel Vecerík
Jonathan Scholz
David Held

Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial position of the objects as well as the agent, and invariant to the pose of the camera. This poses a challenge for learning systems which attempt to solve this task by learning directly from high-dimensional demonstrations: the agent must learn to be both equivariant as well as precise, which can be challenging without any inductive biases about the problem. In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene and then interpreting this representation with novel geometric reasoning layers which are provably SE(3) equivariant. We demonstrate that our method can yield substantially more precise placement predictions in simulated placement tasks than previous methods trained with the same amount of data, and can accurately represent relative placement relationships data collected from real-world demonstrations. Supplementary information and videos can be found at https://sites.google.com/view/reldist-iclr-2023.

Details

TMLR Journal 2024 Journal Article

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Konstantinos Bousmalis
Giulia Vezzani
Dushyant Rao
Coline Manon Devin
Alex X. Lee
Maria Bauza Villalonga
Todor Davchev
Yuxiang Zhou

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100–1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent’s capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

PDF Details

ICRA Conference 2024 Conference Paper

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Mel Vecerík
Carl Doersch
Yi Yang 0007
Todor Davchev
Yusuf Aytar
Guangyao Zhou
Raia Hadsell
Lourdes Agapito

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.

Details

ICLR Conference 2023 Conference Paper

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Mohit Sharma 0001
Claudio Fantacci
Yuxiang Zhou
Skanda Koppula
Nicolas Heess
Jonathan Scholz
Yusuf Aytar

Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce a method for lossless adaptation to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in three manipulation task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings. Please see real world videos at https://sites.google.com/view/robo-adapters

Details

ICRA Conference 2022 Conference Paper

Few-Shot Keypoint Detection as Task Adaptation via Latent Embeddings

Mel Vecerík
Jackie Kay
Raia Hadsell
Lourdes Agapito
Jonathan Scholz

Dense object tracking, the ability to localize specific object points with pixel-level accuracy, is an important computer vision task with numerous downstream applications in robotics. Existing approaches either compute dense keypoint embeddings in a single forward pass, meaning the model is trained to track everything at once, or allocate their full capacity to a sparse predefined set of points, trading generality for accuracy. In this paper we explore a middle ground based on the observation that the number of relevant points at a given time are typically relatively few, e. g. grasp points on a target object. Our main contribution is a novel architecture, inspired by few-shot task adaptation, which allows a sparse-style network to condition on a keypoint embedding that indicates which point to track. Our central finding is that this approach provides the generality of dense-embedding models, while offering accuracy significantly closer to sparse-keypoint approaches. We present results illustrating this capacity vs. accuracy trade-off, and demonstrate the ability to zero-shot transfer to new object instances (within-class) using a real-robot pick-and-place task.

Details

ICRA Conference 2022 Conference Paper

Offline Meta-Reinforcement Learning for Industrial Insertion

Tony Z. Zhao
Jianlan Luo
Oleg Sushkov
Rugile Pevceviciute
Nicolas Heess
Jonathan Scholz
Stefan Schaal
Sergey Levine

Reinforcement learning (RL) can in principle let robots automatically adapt to new tasks, but current RL methods require a large number of trials to accomplish this. In this paper, we tackle rapid adaptation to new tasks through the framework of meta-learning, which utilizes past tasks to learn to adapt with a specific focus on industrial insertion tasks. Fast adaptation is crucial because prohibitively large number of on-robot trials will potentially damage hardware pieces. Additionally, effective adaptation is also feasible in that experience among different insertion applications can be largely leveraged by each other. In this setting, we address two specific challenges when applying meta-learning. First, conventional meta-RL algorithms require lengthy online meta-training. We show that this can be replaced with appropriately chosen offline data, resulting in an offline meta- RL method that only requires demonstrations and trials from each of the prior tasks, without the need to run costly meta-RL procedures online. Second, meta-RL methods can fail to generalize to new tasks that are too different from those seen at meta-training time, which poses a particular challenge in industrial applications, where high success rates are critical. We address this by combining contextual meta-learning with direct online finetuning: if the new task is similar to those seen in the prior data, then the contextual meta-learner adapts immediately, and if it is too different, it gradually adapts through finetuning. We show that our approach is able to quickly adapt to a variety of different insertion tasks, with a success rate of 100% using only a fraction of the samples needed for learning the tasks from scratch. Experiment videos and details are available at //sites.google.com/view/offline-metarl-insertion.https:

Details

ICLR Conference 2022 Conference Paper

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

Todor Davchev
Oleg Sushkov
Jean-Baptiste Regli
Stefan Schaal
Yusuf Aytar
Markus Wulfmeier
Jonathan Scholz

Complex sequential tasks in continuous-control settings often require agents to successfully traverse a set of ``narrow passages'' in their state space. Solving such tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning (RL) due to the associated long-horizon nature of the problem and the lack of sufficient positive signal during learning. Various tools have been applied to address this challenge. When available, large sets of demonstrations can guide agent exploration. Hindsight relabelling on the other hand does not require additional sources of information. However, existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical. In this work, we extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations. We evaluate the approach on four complex, single and dual arm, robotics manipulation tasks against strong suitable baselines. The method requires far fewer demonstrations to solve all tasks and achieves a significantly higher overall performance as task complexity increases. Finally, we investigate the robustness of the proposed solution with respect to the quality of input representations and the number of demonstrations.

Details

ICRA Conference 2019 Conference Paper

A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning

Mel Vecerík
Oleg Sushkov
David Barker
Thomas Rothörl
Todd Hester
Jonathan Scholz

Insertion is a challenging haptic and visual control problem with significant practical value for manufacturing. Existing approaches in the model-based robotics community can be highly effective when task geometry is known, but are complex and cumbersome to implement, and must be tailored to each individual problem by a qualified engineer. Within the learning community there is a long history of insertion research, but existing approaches are either too sample-inefficient to run on real robots, or assume access to high-level object features, e. g. socket pose. In this paper we show that relatively minor modifications to an off-the-shelf Deep-RL algorithm (DDPG), combined with a small number of human demonstrations, allows the robot to quickly learn to solve these tasks efficiently and robustly. Our approach requires no modeling or simulation, no parameterized search or alignment behaviors, no vision system aside from raw images, and no reward shaping. We evaluate our approach on a narrow-clearance peg-insertion task and a deformable clip-insertion task, both of which include variability in the socket position. Our results show that these tasks can be solved reliably on the real robot in less than 10 minutes of interaction time, and that the resulting policies are robust to variance in the socket position and orientation.

Details

IROS Conference 2019 Conference Paper

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Kevin Sebastian Luck
Mel Vecerík
Simon Stepputtis
Heni Ben Amor
Jonathan Scholz

Model-free reinforcement learning algorithms such as Deep Deterministic Policy Gradient (DDPG) often require additional exploration strategies, especially if the actor is of deterministic nature. This work evaluates the use of model-based trajectory optimization methods used for exploration in Deep Deterministic Policy Gradient when trained on a latent image embedding. In addition, an extension of DDPG is derived using a value function as critic, making use of a learned deep dynamics model to compute the policy gradient. This approach leads to a symbiotic relationship between the deep reinforcement learning algorithm and the latent trajectory optimizer. The trajectory optimizer benefits from the critic learned by the RL algorithm and the latter from the enhanced exploration generated by the planner. The developed methods are evaluated on two continuous control tasks, one in simulation and one in the real world. In particular, a Baxter robot is trained to perform an insertion task, while only receiving sparse rewards and images as observations from the environment.

Details

IROS Conference 2016 Conference Paper

Navigation Among Movable Obstacles with learned dynamic constraints

Jonathan Scholz
Nehchal Jindal
Martin Levihn
Charles Isbell
Henrik I. Christensen

In this paper we present the first planner for the problem of Navigation Among Movable Obstacles (NAMO) on a real robot that can handle environments with under-specified object dynamics. This result makes use of recent progress from two threads of the Reinforcement Learning literature. The first is a hierarchical Markov-Decision Process formulation of the NAMO problem designed to handle dynamics uncertainty. The second is a physics-based Reinforcement Learning framework which offers a way to ground this uncertainty in a compact model space that can be efficiently updated from data received by the robot online. Our results demonstrate the ability of a robot to adapt to unexpected object behavior in a real office scenario.

Details

ICRA Conference 2015 Conference Paper

Learning non-holonomic object models for mobile manipulation

Jonathan Scholz
Martin Levihn
Charles Isbell
Henrik I. Christensen
Mike Stilman

For a mobile manipulator to interact with large everyday objects, such as office tables, it is often important to have dynamic models of these objects. However, as it is infeasible to provide the robot with models for every possible object it may encounter, it is desirable that the robot can identify common object models autonomously. Existing methods for addressing this challenge are limited by being either purely kinematic, or inefficient due to a lack of physical structure. In this paper, we present a physics-based method for estimating the dynamics of common non-holonomic objects using a mobile manipulator, and demonstrate its efficiency compared to existing approaches.

Details

ICML Conference 2014 Conference Paper

A Physics-Based Model Prior for Object-Oriented MDPs

Jonathan Scholz
Martin Levihn
Charles Isbell
David Wingate

One of the key challenges in using reinforcement learning in robotics is the need for models that capture natural world structure. There are, methods that formalize multi-object dynamics using relational representations, but these methods are not sufficiently compact for real-world robotics. We present a physics-based approach that exploits modern simulation tools to efficiently parameterize physical dynamics. Our results show that this representation can result in much faster learning, by virtue of its strong but appropriate inductive bias in physical environments.

Details

ICRA Conference 2013 Conference Paper

Planning with movable obstacles in continuous environments with uncertain dynamics

Martin Levihn
Jonathan Scholz
Mike Stilman

In this paper we present a decision theoretic planner for the problem of Navigation Among Movable Obstacles (NAMO) operating under conditions faced by real robotic systems. While planners for the NAMO domain exist, they typically assume a deterministic environment or rely on discretization of the configuration and action spaces, preventing their use in practice. In contrast, we propose a planner that operates in real-world conditions such as uncertainty about the parameters of workspace objects and continuous configuration and action (control) spaces. To achieve robust NAMO planning despite these conditions, we introduce a novel integration of Monte Carlo simulation with an abstract MDP construction. We present theoretical and empirical arguments for time complexity linear in the number of obstacles as well as a detailed implementation and examples from a dynamic simulation environment.

Details

NeurIPS Conference 2013 Conference Paper

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Shane Griffith
Kaushik Subramanian
Jonathan Scholz
Charles Isbell
Andrea Thomaz

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. State-of-the-art methods have approached this problem by mapping human information to reward and value signals to indicate preferences and then iterating over them to compute the necessary control policy. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct labels on the policy. We compare Advise to state-of-the-art approaches and highlight scenarios where it outperforms them and importantly is robust to infrequent and inconsistent human feedback.

PDF Details

RLDM Conference 2013 Conference Abstract

Policy Shaping: Integrating Human Feedback with Reinforcement Learning

Shane Griffith
Kaushik Subramanian
Jonathan Scholz
Andrea Thomaz

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate and more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches using a series of experiments. These experiments use two classic arcade games, together with feedback from a simulated human teacher, which allows us to systematically test performance under a variety of cases of infrequent and inconsistent feedback. We show that Advise has similar performance to the state of the art, but is more robust to a noisy signal from the human and fairs well with an inaccurate estimate of its single input parameter. With these advancements this paper may help to make learning from human feedback an increasingly viable option for intelligent systems.

PDF Details

RLDM Conference 2013 Conference Abstract

Policy shaping: Integrating human feedback with reinforcement learning

Shane Griffith
Kaushik Subramanian
Jonathan Scholz
Charles Isbell
Andrea Thomaz

A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human in- formation to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate and more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches using a series of exper- iments. These experiments use two classic arcade games, together with feedback from a simulated human teacher, which allows us to systematically test performance under a variety of cases of infrequent and in- consistent feedback. We show that Advise has similar performance to the state of the art, but is more robust to a noisy signal from the human and fairs well with an inaccurate estimate of its single input parameter. With these advancements this paper may help to make learning from human feedback an increasingly viable option for intelligent systems. Sunday, October 27, 2013

PDF Details

RLDM Conference 2013 Conference Abstract

What Does Physics Bias: A Comparison of Model Priors for Robot Manipulation

Jonathan Scholz
Martin Levihn

We explore robot object manipulation as a Bayesian model-based reinforcement learning prob- lem under a collection of different model priors. Our main contribution is to highlight the limitations of classical non-parametric regression approaches in the context of online learning, and to introduce an alter- native approach based on monolithic physical inference. The primary motivation for this line of research is to incorporate physical system identification into the RL model, where it can be integrated with modern approaches to Bayesian structure learning. Overall, our results support the idea that modern physical simu- lation tools provide a model space with an appropriate inductive bias for manipulation problems in natural environments.

PDF Details

ICRA Conference 2011 Conference Paper

Cart pushing with a mobile manipulation system: Towards navigation with moveable objects

Jonathan Scholz
Sachin Chitta
Bhaskara Marthi
Maxim Likhachev

Robust navigation in cluttered environments has been well addressed for mobile robotic platforms, but the problem of navigating with a moveable object like a cart has not been widely examined. In this work, we present a planning and control approach to navigation of a humanoid robot while pushing a cart. We show how immediate information about the environment can be integrated into this approach to achieve safer navigation in the presence of dynamic obstacles. We demonstrate the robustness of our approach through long-running experiments with the PR2 mobile manipulation robot in a typical indoor office environment, where the robot faced narrow and high-traffic passageways with very limited clearance.

Details