Author name cluster

Freek Stulp

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

EWRL Workshop 2025 Workshop Paper

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

Antonin Raffin
Olivier Sigaud
Jens Kober
Alin Albu-Schaeffer
João Silvério
Freek Stulp

In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

ICRA Conference 2025 Conference Paper

RACCOON: Grounding Embodied Question-Answering with State Summaries from Existing Robot Modules

Samuel Bustamante-Gomez
Markus Knauer
Jeremias Thun
Stefan Schneyer
Alin Albu-Schäffer
Bernhard M. Weber
Freek Stulp

Explainability is vital for establishing user trust, also in robotics. Recently, foundation models (e. g. vision-language models, VLMs) fostered a wave of embodied agents that answer arbitrary queries about their environment and their interactions with it. However, naively prompting VLMs to answer queries based on camera images does not take into account existing robot architectures which represent the robot's tasks, skills, and beliefs about the state of the world. To overcome this limitation, we propose RACCOON, a framework that combines foundation models' responses with a robot's internal knowledge. Inspired by Retrieval-Augmented Generation (RAG), RACCOON selects relevant context, retrieves information from the robot's state, and utilizes it to refine prompts for an LLM to answer questions accurately. This bridges the gap between the model's adaptability and the robot's domain expertise.

ICRA Conference 2024 Conference Paper

A probabilistic approach for learning and adapting shared control skills with the human in the loop

Gabriel Quere
Freek Stulp
David Filliat
João Silvério

Assistive robots promise to be of great help to wheelchair users with motor impairments, for example for activities of daily living. Using shared control to provide task-specific assistance – for instance with the Shared Control Templates (SCT) framework – facilitates user control, even with low-dimensional input signals. However, designing SCTs is a laborious task requiring robotic expertise. To facilitate their design, we propose a method to learn one of their core components – active constraints – from demonstrated end-effector trajectories. We use a probabilistic model, Kernelized Movement Primitives, which additionally allows adaptation from user commands to improve the shared control skills, during both design and execution. We demonstrate that the SCTs so acquired can be successfully used to pick up an object, as well as adjusted for new environmental constraints, with our assistive robot EDAN.

RLC Conference 2024 Conference Paper

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

Antonin Raffin
Olivier Sigaud
Jens Kober
Alin Albu-Schaeffer
João Silvério
Freek Stulp

In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

RLJ Journal 2024 Journal Article

An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks

Antonin Raffin
Olivier Sigaud
Jens Kober
Alin Albu-Schaeffer
João Silvério
Freek Stulp

In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.

ICRA Conference 2024 Conference Paper

Fitting Parameters of Linear Dynamical Systems to Regularize Forcing Terms in Dynamical Movement Primitives

Freek Stulp
Adrià Colomé
Carme Torras

Due to their flexibility and ease of use, Dynamical Movement Primitives (DMPs) are widely used in robotics applications and research. DMPs combine linear dynamical systems to achieve robustness to perturbations and adaptation to moving targets with non-linear function approximators to fit a wide range of demonstrated trajectories. We propose a novel DMP formulation with a generalized logistic function as a delayed goal system. This formulation inherently has low initial jerk, and generates the bell-shaped velocity profiles that are typical of human movement. As the novel formulation is more expressive, it is able to fit a wide range of human demonstrations well, also without a non-linear forcing term. We exploit this increased expressiveness by automating the fitting of the dynamical system parameters through opti-mization. Our experimental evaluation demonstrates that this optimization regularizes the forcing term, and improves the interpolation accuracy of parametric DMPs.

ICRA Conference 2024 Conference Paper

Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration

Abby O'Neill
Abdul Rehman
Abhiram Maddukuri
Abhishek Gupta 0004
Abhishek Padalkar
Abraham Lee
Acorn Pooley
Agrim Gupta

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x. github.io.

ICRA Conference 2024 Conference Paper

Unknown Object Grasping for Assistive Robotics

Elle Miller
Maximilian Durner
Matthias Humt
Gabriel Quere
Wout Boerdijk
Ashok M. Sundaram
Freek Stulp
Jörn Vogel

We propose a novel pipeline for unknown object grasping in shared robotic autonomy scenarios. State-of-the-art methods for fully autonomous scenarios are typically learning-based approaches optimised for a specific end-effector, that generate grasp poses directly from sensor input. In the domain of assistive robotics, we seek instead to utilise the user’s cognitive abilities for enhanced satisfaction, grasping performance, and alignment with their high level task-specific goals. Given a pair of stereo images, we perform unknown object instance segmentation and generate a 3D reconstruction of the object of interest. In shared control, the user then guides the robot end-effector across a virtual hemisphere centered around the object to their desired approach direction. A physics-based grasp planner finds the most stable local grasp on the reconstruction, and finally the user is guided by shared control to this grasp. In experiments on the DLR EDAN platform, we report a grasp success rate of 87% for 10 unknown objects, and demonstrate the method’s capability to grasp objects in structured clutter and from shelves.

ICRA Conference 2023 Conference Paper

Guiding Reinforcement Learning with Shared Control Templates

Abhishek Padalkar
Gabriel Quere
Franz Steinmetz
Antonin Raffin
Matthias Nieuwenhuisen
João Silvério
Freek Stulp

Purposeful interaction with objects usually requires certain constraints to be respected, e. g. keeping a bottle upright to avoid spilling. In reinforcement learning, such constraints are typically encoded in the reward function. As a consequence, constraints can only be learned by violating them. This often precludes learning on the physical robot, as it may take many trials to learn the constraints, and the necessity to violate them during the trial-and-error learning may be unsafe. We have serendipitously discovered that constraint representations for shared control - in particular Shared Control Templates (SCTs) - are ideally suited for safely guiding RL. Representing constraints explicitly, rather than implicitly in the reward function, also simplifies the design of the reward function. The main advantage of the approach is safer, faster learning without constraint violations (even with sparse reward functions). We demonstrate this in a pouring task in simulation and on a real robot, where learning the task requires only 65 episodes in 16 minutes.

ICRA Conference 2023 Conference Paper

ROSMC: A High-Level Mission Operation Framework for Heterogeneous Robotic Teams

Ryo Sakagami
Sebastian G. Brunner
Andreas Dömel
Armin Wedler
Freek Stulp

Heterogeneous teams of multiple mobile robots will be important for future scientific explorations of extraterrestrial surfaces or hazardous areas. Mission operation in such harsh, unknown environments poses diverse challenges. Robots need to cooperate autonomously due to the large network latency to the ground station while operators need to adapt the ongoing mission flexibly based on new discoveries obtained during execution. Furthermore, shared situational awareness between operators and roboticists is highly required to deal with execution failures promptly. To overcome these challenges, this paper proposes the high-level mission operation framework ROSMC. The concept of mission synchronization to robots enables continuous mission adaptations and future planning by operators while robots execute the mission autonomously. The ROS-based GUIs enable operators to intuitively create and monitor the mission for robots as well as to communicate with roboticists smoothly. The proposed framework was evaluated by a pilot study with a simulator and demonstrated at a Moon-analogue field on Mt. Etna in Sicily, Italy, involving 3 robots and around 70 researchers for 4 weeks.

ICRA Conference 2022 Conference Paper

CATs: Task Planning for Shared Control of Assistive Robots with Variable Autonomy

Samuel Bustamante-Gomez
Gabriel Quere
Daniel Leidner
Jörn Vogel
Freek Stulp

From robotic space assistance to healthcare robotics, there is increasing interest in robots that offer adaptable levels of autonomy. In this paper, we propose an action representation and planning framework that is able to generate plans that can be executed with both shared control and supervised autonomy, even switching between them during task execution. The action representation - Constraint Action Templates (CATs) - combine the advantages of Action Templates [1] and Shared Control Templates [2]. We demonstrate that CATs enable our planning framework to generate goal-directed plans for variations of a typical task of daily living, and that users can execute them on the wheelchair-robot EDAN in shared control or in autonomous mode.

IROS Conference 2022 Conference Paper

Multi-Phase Multi-Modal Haptic Teleoperation

Maximilian Mühlbauer 0001
Franz Steinmetz
Freek Stulp
Thomas Hulin
Alin Albu-Schäffer

Virtual Fixtures facilitate teleoperation, for in-stance by guiding the human operator. Developing these Virtual Fixtures in tasks with tight tolerances remains challenging. Fixtures with a high stiffness allow for more precise guidance, whereas a lower stiffness is required to allow for corrections. We observed that many assembly operations can be split into different phases - approaching, positioning, in-contact manipulation - each with different accuracy requirements. Therefore, we propose to use multi-modal fixtures, satisfying the different requirements of these phases: i. e. a position-based Trajectory Fixture for approaching and a more accurate Visual Servoing Fixture for the positioning phase. A state estimation and arbitration component ensures smooth transitions between the fixtures to provide optimal support for the operator and to achieve global availability paired with local precision at the same time. It also allows a high stiffness to be used throughout, thus achieving good guidance for all phases. The approach is validated in an application from a space scenario, consisting of the assembly of a CubeSat subsystem. The empirical results from a pilot study on this task show that our approach is faster and requires less interaction force from the operator than the baseline method.

KR Conference 2021 Short Paper

Flexible Robotic Assembly Based on Ontological Representation of Tasks, Skills, and Resources

Philipp Matthias Schäfer
Franz Steinmetz
Stefan Schneyer
Timo Bachmann
Thomas Eiband
Florian Samuel Lay
Abhishek Padalkar
Christoph Sürig

Technology has sufficiently matured to enable, in principle, flexible and autonomous robotic assembly systems. However, in practice, it requires making all the relevant (implicit) knowledge that system engineers and workers have – about products to be assembled, tasks to be performed, as well as robots and their skills – available to the system explicitly. Only then can the planning and execution components of a robotic assembly pipeline communicate with each other in the same language and solve tasks autonomously without human intervention. This is why we have developed the Factory of the Future (FoF) ontology. At its core, this ontology models the tasks that are necessary to assemble a product and the robotic skills that can be employed to complete said tasks. The FoF ontology is based on existing standards. We started with theoretical considerations and iteratively adapted it based on practical experience gained from incorporating more and more components required for automated planning and assembly. Furthermore, we propose tools to extend the ontology for specific scenarios with knowledge about parts, robots, tools, and skills from various sources. The resulting scenario ontology serves us as world model for the robotic systems and other components of the assembly process. A central runtime interface to this world model provides fast and easy access to the knowledge during execution. In this work, we also show the integration of a graphical user front-end, an assembly planner, a workspace reconfigurator, and more components of the assembly pipeline that all communicate with the help of the FoF ontology. Overall, our integration of the FoF ontology with the other components of a robotic assembly pipeline shows that using an ontology is a practical method to establish a common language and understanding between the involved components.

PDF Details DOI

ICRA Conference 2021 Conference Paper

Friction Estimation for Tendon-Driven Robotic Hands

Friedrich Lange
Martin Pfanne
Franz Steinmetz
Sebastian Wolf 0001
Freek Stulp

In tendon-driven robotic hands, tendons are usually routed along several pulleys. The resulting friction is often substantial, and must therefore be modelled and estimated, for instance for accurate control and contact detection. Common approaches for friction estimation consider special dedicated setups, where the parameters of a static or dynamic friction model at a single contact point are determined. In this paper, we rather combine such individual friction models into an overall friction model for the entire finger. Furthermore, we propose a method for estimating the parameters of this overall model in situ, i. e. from trajectories executed on the assembled hand, avoiding the need for dedicated setups. An important component of the proposed model is a varying bias for treating friction at low velocities, allowing a simpler static friction model to be used. We demonstrate that our approach enables contacts to be detected more accurately on the DLR David hand, without additional sensors.

IROS Conference 2021 Conference Paper

Learning and Interactive Design of Shared Control Templates

Gabriel Quere
Samuel Bustamante-Gomez
Annette Hagengruber
Jörn Vogel
Franz Steinmetz
Freek Stulp

Controlling a robotic arm to achieve manipulation tasks is challenging for humans. Especially if only low-dimensional input signals can be provided, as is often the case for users with motor impairments. Using shared control to provide task-specific guidance and constraints facilitates control – for instance with the Shared Control Templates (SCT) framework – and enables even complex activities of daily living to be performed successfully. However, designing SCTs is a laborious task requiring robotic expertise. To make such design easier and faster, we propose a method for semi-automatically designing SCTs on the basis of demonstrations. Furthermore, we propose two similarity metrics, and demonstrate how these can be used to transfer knowledge from one SCT to another. We demonstrate that the SCTs so acquired can be successfully used in shared control for everyday tasks such as opening a drawer or a cupboard on our assistive robot EDAN.

ICRA Conference 2020 Conference Paper

Probabilistic Effect Prediction through Semantic Augmentation and Physical Simulation

Adrian Simon Bauer
Peter Schmaus
Freek Stulp
Daniel Leidner

Nowadays, robots are mechanically able to perform highly demanding tasks, where AI-based planning methods are used to schedule a sequence of actions that result in the desired effect. However, it is not always possible to know the exact outcome of an action in advance, as failure situations may occur at any time. To enhance failure tolerance, we propose to predict the effects of robot actions by augmenting collected experience with semantic knowledge and leveraging realistic physics simulations. That is, we consider semantic similarity of actions in order to predict outcome probabilities for previously unknown tasks. Furthermore, physical simulation is used to gather simulated experience that makes the approach robust even in extreme cases. We show how this concept is used to predict action success probabilities and how this information can be exploited throughout future planning trials. The concept is evaluated in a series of real world experiments conducted with the humanoid robot Rollin' Justin.

ICRA Conference 2020 Conference Paper

Robust, Locally Guided Peg-in-Hole using Impedance-Controlled Robots

Korbinian Nottensteiner
Freek Stulp
Alin Albu-Schäffer

We present an approach for the autonomous, robust execution of peg-in-hole assembly tasks. We build on a sampling-based state estimation framework, in which samples are weighted according to their consistency with the position and joint torque measurements. The key idea is to reuse these samples in a motion generation step, where they are assigned a second task-specific weight. The algorithm thereby guides the peg towards the goal along the configuration space. An advantage of the approach is that the user only needs to provide: the geometry of the objects as mesh data, as well as a rough estimate of the object poses in the workspace, and a desired goal state. Another advantage is that the local, online nature of our algorithm leads to robust behavior under uncertainty. The approach is validated in the case of our robotic setup and under varying uncertainties for the classical peg-in-hole problem subject to two different geometries.

ICRA Conference 2020 Conference Paper

Shared Control Templates for Assistive Robotics

Gabriel Quere
Annette Hagengruber
Maged Iskandar
Samuel Bustamante-Gomez
Daniel Leidner
Freek Stulp
Jörn Vogel

Light-weight robotic manipulators can be used to restore the manipulation capability of people with a motor disability. However, manipulating the environment poses a complex task, especially when the control interface is of low bandwidth, as may be the case for users with impairments. Therefore, we propose a constraint-based shared control scheme to define skills which provide support during task execution. This is achieved by representing a skill as a sequence of states, with specific user command mappings and different sets of constraints being applied in each state. New skills are defined by combining different types of constraints and conditions for state transitions, in a human-readable format. We demonstrate its versatility in a pilot experiment with three activities of daily living. Results show that even complex, high-dimensional tasks can be performed with a low-dimensional interface using our shared control approach.

IROS Conference 2019 Conference Paper

Teleoperating Robots from the International Space Station: Microgravity Effects on Performance with Force Feedback

Bernhard M. Weber
Ribin Balachandran
Cornelia Riecke
Freek Stulp
Martin Stelzer

Sending humans to Mars’ surface to build habitats is, for now, prohibitively dangerous and costly. An alternative is to have humans in orbiters, teleoperating robots on Mars to construct habitats, deferring human arrival until these habitats are finished. This paper describes the Kontur-2 experiments, in which the feasibility of this scenario was tested with the International Space Station as an orbiter, a cosmonaut operating a force-feedback joystick as an input device for teleoperation, and Earth as the planet where the teleoperated robot is located. In particular, we focus on human teleoperation performance, which is known to deteriorate under conditions of spaceflight. We investigate whether the provision of force feedback at the joystick is as beneficial as under terrestrial conditions. Our results show that, to support humans operating in weightlessness, haptic assistance needs to be adjusted to the altered environmental condition.

IROS Conference 2018 Conference Paper

Optimizing Contextual Ergonomics Models in Human-Robot Interaction

Antonio Gonzales Marin
Mohammad S. Shourijeh
Pavel E. Galibarov
Michael Damsgaard
Lars Fritzsch
Freek Stulp

Current ergonomic assessment procedures require observation and manual annotation of postures by an expert, after which ergonomic scores are inferred from these annotations. Our aim is to automate this procedure and to enable robots to optimize their behavior with respect to such scores. A particular challenge is that ergonomic scoring requires accurate biomechanical simulations which are computationally too expensive to use in robot control loops or optimization. To address this, we learn Contextual Ergonomics Models, which are Gaussian Process Latent Variable Models that have been trained with full musculoskeletal simulations for specific tasks contexts. Contextual Ergonomics Models enable search in a low-dimensional latent space, whilst the cost function can be defined in terms of the full high-dimensional musculoskeletal model, which can be quickly reconstructed from the latent space. We demonstrate how optimizing Contextual Ergonomics Models leads to significantly reduced muscle activation in an experiment with eight subjects performing a drilling task.

IJCAI Conference 2017 Conference Paper

Tensor Based Knowledge Transfer Across Skill Categories for Robot Control

Chenyang Zhao
Timothy M. Hospedales
Freek Stulp
Olivier Sigaud

Advances in hardware and learning for control are enabling robots to perform increasingly dextrous and dynamic control tasks. These skills typically require a prohibitive amount of exploration for reinforcement learning, and so are commonly achieved by imitation learning from manual demonstration. The costly non-scalable nature of manual demonstration has motivated work into skill generalisation, e. g. , through contextual policies and options. Despite good results, existing work along these lines is limited to generalising across variants of one skill such as throwing an object to different locations. In this paper we go significantly further and investigate generalisation across qualitatively different classes of control skills. In particular, we introduce a class of neural network controllers that can realise four distinct skill classes: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, we are able to extract transferrable latent skills, that enable dramatic acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dextrous control tasks like ball-in-cup from scratch with pure reinforcement learning.

IROS Conference 2015 Conference Paper

Co-manipulation with multiple probabilistic virtual guides

Gennaro Raiola
Xavier Lamy
Freek Stulp

In co-manipulation, humans and robots solve manipulation tasks together. Virtual guides are important tools for co-manipulation, as they constrain the movement of the robot to avoid undesirable effects, such as collisions with the environment. Defining virtual guides is often a laborious task requiring expert knowledge. This restricts the usefulness of virtual guides in environments where new tasks may need to be solved, or where multiple tasks need to be solved sequentially, but in an unknown order. To this end, we propose a framework for multiple probabilistic virtual guides, and demonstrate a concrete implementation of such guides using kinesthetic teaching and Gaussian mixture models. Our approach enables non-expert users to design virtual guides through demonstration. Also, they may demonstrate novel guides, even if already known guides are active. Finally, users are able to intuitively select the appropriate guide from a set of guides through physical interaction with the robot. We evaluate our approach in a pick-and-place task, where users are to place objects at one of several positions in a cupboard.

IROS Conference 2015 Conference Paper

Facilitating intention prediction for humans by optimizing robot motions

Freek Stulp
Jonathan Grizou
Baptiste Busch
Manuel Lopes 0001

Members of a team are able to coordinate their actions by anticipating the intentions of others. Achieving such implicit coordination between humans and robots requires humans to be able to quickly and robustly predict the robot's intentions, i. e. the robot should demonstrate a behavior that is legible. Whereas previous work has sought to explicitly optimize the legibility of behavior, we investigate legibility as a property that arises automatically from general requirements on the efficiency and robustness of joint human-robot task completion. We do so by optimizing fast and successful completion of joint human-robot tasks through policy improvement with stochastic optimization. Two experiments with human subjects show that robots are able to adapt their behavior so that humans become better at predicting the robot's intentions early on, which leads to faster and more robust overall task completion.

IROS Conference 2014 Conference Paper

Simultaneous on-line Discovery and Improvement of Robotic Skill options

Freek Stulp
Laura Herlant
Antoine Hoarau
Gennaro Raiola

The regularity of everyday tasks enables us to reuse existing solutions for task variations. For instance, most door-handles require the same basic skill (reach, grasp, turn, pull), but small adaptations of the basic skill are required to adapt to the variations that exist (e. g. levers vs. knobs). We introduce the algorithm “Simultaneous On-line Discovery and Improvement of Robotic Skills” (SODIRS) that is able to autonomously discover and optimize skill options for such task variations. We formalize the problem in a reinforcement learning context, and use the PIBB algorithm [2] to continually optimize skills with respect to a cost function. SODIRS discovers new subskills, or “skill options”, by clustering the costs of trials, and determining whether perceptual features are able to predict which cluster a trial will belong to. This enables SODIRS to build a decision tree, in which the leaves contain skill options for task variations. We demonstrate SODIRS' performance in simulation, as well as on a Meka humanoid robot performing the ball-in-cup task.

IROS Conference 2012 Conference Paper

Adaptive exploration for continual reinforcement learning

Freek Stulp

Most experiments on policy search for robotics focus on isolated tasks, where the experiment is split into two distinct phases: (1) the learning phase, where the robot learns the task through exploration; (2) the exploitation phase, where exploration is turned off, and the robot demonstrates its performance on the task it has learned. In this paper, we present an algorithm that enables robots to continually and autonomously alternate between these phases. We do so by combining the `Policy Improvement with Path Integrals' direct reinforcement learning algorithm with the covariance matrix adaptation rule from the `Cross-Entropy Method' optimization algorithm. This integration is possible because both algorithms iteratively update parameters with probability-weighted averaging. A practical advantage of the novel algorithm, called PI2-CMA, is that it alleviates the user from having to manually tune the degree of exploration. We evaluate PI2-CMA's ability to continually and autonomously tune exploration on two tasks.

IROS Conference 2012 Conference Paper

Comparing motion generation and motion recall for everyday mobile manipulation tasks

Carmen Lopera
Hilario Tome
Adolfo Rodríguez Tsouroukdissian
Freek Stulp

When first posed with the problem 15 × 15, we may generate the answer by applying a set of rules, e. g breaking the problem down into (10 + 5) × 15 and solving the subcomponents of these simpler multiplications first [2]. But after having solved this problem several times, we simply recall that the answer to 15 × 15 is 225. This distinction between generation and recall can also be applied to motor planning [2], as described in the next two sections.

ICML Conference 2012 Conference Paper

Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp
Olivier Sigaud

IROS Conference 2011 Conference Paper

Learning motion primitive goals for robust manipulation

Freek Stulp
Evangelos A. Theodorou
Mrinal Kalakrishnan
Peter Pastor
Ludovic Righetti
Stefan Schaal

Applying model-free reinforcement learning to manipulation remains challenging for several reasons. First, manipulation involves physical contact, which causes discontinuous cost functions. Second, in manipulation, the end-point of the movement must be chosen carefully, as it represents a grasp which must be adapted to the pose and shape of the object. Finally, there is uncertainty in the object pose, and even the most carefully planned movement may fail if the object is not at the expected position. To address these challenges we 1) present a simplified, computationally more efficient version of our model-free reinforcement learning algorithm PI 2; 2) extend PI 2 so that it simultaneously learns shape parameters and goal parameters of motion primitives; 3) use shape and goal learning to acquire motion primitives that are robust to object pose uncertainty. We evaluate these contributions on a manipulation platform consisting of a 7-DOF arm with a 4-DOF hand.

ICRA Conference 2011 Conference Paper

Learning to grasp under uncertainty

Freek Stulp
Evangelos A. Theodorou
Jonas Buchli
Stefan Schaal

We present an approach that enables robots to learn motion primitives that are robust towards state estimation uncertainties. During reaching and preshaping, the robot learns to use line manipulation strategies to maneuver the object into a pose at which closing the hand to perform the grasp is more likely to succeed. In contrast, common assumptions in grasp planning and motion planning for reaching are that these tasks can be performed independently, and that the robot has perfect knowledge of the pose of the objects in the environment. We implement our approach using Dynamic Movement Primitives and the probabilistic model-free reinforcement learning algorithm Policy Improvement with Path Integrals (PI 2 ). The cost function that PI 2 optimizes is a simple boolean that penalizes failed grasps. The key to acquiring robust motion primitives is to sample the actual pose of the object from a distribution that represents the state estimation uncertainty. During learning, the robot will thus optimize the chance of grasping an object from this distribution, rather than at one specific pose. In our empirical evaluation, we demonstrate how the motion primitives become more robust when grasping simple cylindrical objects, as well as more complex, non-convex objects. We also investigate how well the learned motion primitives generalize towards new object positions and other state estimation uncertainty distributions.

IROS Conference 2011 Conference Paper

Movement segmentation using a primitive library

Franziska Meier
Evangelos A. Theodorou
Freek Stulp
Stefan Schaal

Segmenting complex movements into a sequence of primitives remains a difficult problem with many applications in the robotics and vision communities. In this work, we show how the movement segmentation problem can be reduced to a sequential movement recognition problem. To this end, we reformulate the original Dynamic Movement Primitive (DMP) formulation as a linear dynamical system with control inputs. Based on this new formulation, we develop an Expectation-Maximization algorithm to estimate the duration and goal position of a partially observed trajectory. With the help of this algorithm and the assumption that a library of movement primitives is present, we present a movement segmentation framework. We illustrate the usefulness of the new DMP formulation on the two applications of online movement recognition and movement segmentation.

IROS Conference 2009 Conference Paper

Action-related place-based mobile manipulation

Freek Stulp
Andreas Fedrizzi
Michael Beetz

In mobile manipulation, the position to which the robot navigates has a large influence on the ease with which a subsequent manipulation action can be performed. Whether a manipulation action succeeds depends on many factors, such as the robot's hardware configuration, the controllers the robot uses to achieve navigation and manipulation, the task context, and uncertainties in state estimation. In this paper, we present ‘ARPLACE’, an action-related place which takes these factors, and the context in which the actions are performed into account. Through experience-based learning, the robot first learns a so-called generalized success model, which discerns between positions from which manipulation succeeds or fails. On-line, this model is used to compute a ARPLACE, a probability distribution that maps positions to a predicted probability of successful manipulation, and takes the uncertainty in the robot and object's position into account. In an empirical evaluation, we demonstrate that using ARPLACEs for least-commitment navigation improves the success rate of subsequent manipulation tasks substantially.

ICRA Conference 2007 Conference Paper

Seamless Execution of Action Sequences

Freek Stulp
Wolfram Koska
Alexis Maldonado
Michael Beetz

One of the most notable and recognizable features of robot motion is the abrupt transitions between actions in action sequences. In contrast, humans and animals perform sequences of actions efficiently, and with seamless transitions between subsequent actions. This smoothness is not a goal in itself, but a side-effect of the evolutionary optimization of other performance measures. In this paper, we argue that such jagged motion is an inevitable consequence of the way human designers and planners reason about abstract actions. We then present subgoal refinement, a procedure that optimizes action sequences. Sub-goal refinement determines action parameters that are not relevant to why the action was selected, and optimizes these parameters with respect to expected execution performance. This performance is computed using action models, which are learned from observed experience. We integrate subgoal refinement in an existing planning system, and demonstrate how requiring optimal performance causes smooth motion in three robotic domains.

AAMAS Conference 2006 Conference Paper

Action Awareness - Enabling Agents to Optimize, Transform, and Coordinate Plans

Freek Stulp
Michael Beetz

ICRA Conference 2006 Conference Paper

Implicit Coordination in Robotic Teams using Learned Prediction Models

Freek Stulp
Michael Isik
Michael Beetz

Many application tasks require the cooperation of two or more robots. Humans are good at cooperation in shared workspaces, because they anticipate and adapt to the intentions and actions of others. In contrast, multi-agent and multi-robot systems rely on communication to exchange their intentions. This causes problems in domains where perfect communication is not guaranteed, such as rescue robotics, autonomous vehicles participating in traffic, or robotic soccer. In this paper, we introduce a computational model for implicit coordination, and apply it to a typical coordination task from robotic soccer: regaining ball possession. The computational model specifies that performance prediction models are necessary for coordination, so we learn them off-line from observed experience. By taking the perspective of the team mates, these models are then used to predict utilities of others, and optimize a shared performance model for joint actions. In several experiments conducted with our robotic soccer team, we evaluate the performance of implicit coordination

IJCAI Conference 2005 Conference Paper

Optimized Execution of Action Chains Using Learned Performance Models of Abstract Actions

Freek Stulp
Michael

Many plan-based autonomous robot controllers generate chains of abstract actions in order to achieve complex, dynamically changing, and possibly interacting goals. The execution of these action chains often results in robot behavior that shows abrupt transitions between subsequent actions, causing suboptimal performance. The resulting motion patterns are so characteristic for robots that people imitating robotic behavior will do so by making abrupt movements between actions. In this paper we propose a novel computation model for the execution of abstract action chains. In this computation model a robot first learns situation-specific performance models of abstract actions. It then uses these models to automatically specialize the abstract actions for their execution in a given action chain. This specialization results in refined chains that are optimized for performance. As a side effect this behavior optimization also appears to produce action chains with seamless transitions between actions.