Arrow Research search

Author name cluster

Gerhard Neumann

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

112 papers
2 author rows

Possible papers

112

TMLR Journal 2026 Journal Article

Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning

  • Philipp Dahlinger
  • Niklas Freymuth
  • Tai Hoang
  • Tobias Würth
  • Michael Volpp
  • Luise Kärger
  • Gerhard Neumann

Simulating object deformations is a critical challenge across many scientific domains, including robotics, manufacturing, and structural mechanics. Learned Graph Network Simulators (GNSs) offer a promising alternative to traditional mesh-based physics simulators. Their speed and inherent differentiability make them particularly well suited for applications that require fast and accurate simulations, such as robotic manipulation or manufacturing optimization. However, existing learned simulators typically rely on single-step observations, which limits their ability to exploit temporal context. Without this information, these models fail to infer, e.g., material properties. Further, they rely on auto-regressive rollouts, which quickly accumulate error for long trajectories. We instead frame mesh-based simulation as a trajectory-level meta-learning problem. Using Conditional Neural Processes, our method enables rapid adaptation to new simulation scenarios from limited initial data while capturing their latent simulation properties. We utilize movement primitives to directly predict fast, stable and accurate simulations from a single model call. The resulting approach, Movement-primitive Meta-MeshGraphNet (M3GN), provides higher simulation accuracy at a fraction of the runtime cost compared to state-of-the-art GNSs across several tasks.

NeurIPS Conference 2025 Conference Paper

AMBER: Adaptive Mesh Generation by Iterative Mesh Resolution Prediction

  • Niklas Freymuth
  • Tobias Würth
  • Nicolas Schreiber
  • Balázs Gyenes
  • Andreas Boltres
  • Johannes Mitsch
  • Aleksandar Taranovic
  • Tai Hoang

The cost and accuracy of simulating complex physical systems using the Finite Element Method (FEM) scales with the resolution of the underlying mesh. Adaptive meshes improve computational efficiency by refining resolution in critical regions, but typically require task-specific heuristics or cumbersome manual design by a human expert. We propose Adaptive Meshing By Expert Reconstruction (AMBER), a supervised learning approach to mesh adaptation. Starting from a coarse mesh, AMBER iteratively predicts the sizing field, i. e. , a function mapping from the geometry to the local element size of the target mesh, and uses this prediction to produce a new intermediate mesh using an out-of-the-box mesh generator. This process is enabled through a hierarchical graph neural network, and relies on data augmentation by automatically projecting expert labels onto AMBER-generated data during training. We evaluate AMBER on 2D and 3D datasets, including classical physics problems, mechanical components, and real-world industrial designs with human expert meshes. AMBER generalizes to unseen geometries and consistently outperforms multiple recent baselines, including ones using Graph and Convolutional Neural Networks, and Reinforcement Learning-based approaches.

NeurIPS Conference 2025 Conference Paper

Diffusion-Based Hierarchical Graph Neural Networks for Simulating Nonlinear Solid Mechanics

  • Tobias Würth
  • Niklas Freymuth
  • Gerhard Neumann
  • Luise Kärger

Graph-based learned simulators have emerged as a promising approach for simulating physical systems on unstructured meshes, offering speed and generalization across diverse geometries. However, they often struggle with capturing global phenomena, such as bending or long-range correlations usually occurring in solid mechanics, and suffer from error accumulation over long rollouts due to their reliance on local message passing and direct next-step prediction. We address these limitations by introducing the Rolling Diffusion-Batched Inference Network (ROBIN), a novel learned simulator that integrates two key innovations: (i) Rolling Diffusion-Batched Inference (ROBI), a parallelized inference scheme that amortizes the cost of diffusion-based refinement across physical time steps by overlapping denoising steps across a temporal window. (ii) A Hierarchical Graph Neural Network built on algebraic multigrid coarsening, enabling multiscale message passing across different mesh resolutions. This architecture, implemented via Algebraic-hierarchical Message Passing Networks, captures both fine-scale local dynamics and global structural effects critical for phenomena like beam bending or multi-body contact. We validate ROBIN on challenging 2D and 3D solid mechanics benchmarks involving geometric, material, and contact nonlinearities. ROBIN achieves state-of-the-art accuracy on all tasks, substantially outperforming existing next-step learned simulators while reducing inference time by up to an order of magnitude compared to standard diffusion simulators.

EWRL Workshop 2025 Workshop Paper

DIME: Diffusion-Based Maximum Entropy Reinforcement Learning

  • Onur Celik
  • Zechu Li
  • Denis Blessing
  • Ge Li
  • Daniel Palenicek
  • Jan Peters
  • Georgia Chalvatzaki
  • Gerhard Neumann

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

ICML Conference 2025 Conference Paper

DIME: Diffusion-Based Maximum Entropy Reinforcement Learning

  • Onur Celik
  • Zechu Li
  • Denis Blessing
  • Ge Li
  • Daniel Palenicek
  • Jan Peters 0001
  • Georgia Chalvatzaki
  • Gerhard Neumann

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

ICLR Conference 2025 Conference Paper

Efficient Off-Policy Learning for High-Dimensional Action Spaces

  • Fabian Otto
  • Philipp Becker
  • Ngo Anh Vien
  • Gerhard Neumann

Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation, which can be problematic in high-dimensional action spaces due to the curse of dimensionality. This reliance results in data inefficiency as maintaining a state-action-value function in such spaces is challenging. We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning. This approach, which we refer to as Vlearn, effectively circumvents the limitations of existing methods by eliminating the necessity for an explicit state-action-value function. To this end, we leverage a weighted importance sampling loss for learning deep value functions from off-policy data. While this is common for linear methods, it has not been combined with deep value function networks. This transfer to deep methods is not straightforward and requires novel design choices such as robust policy updates, twin value function networks to avoid an optimization bias, and importance weight clipping. We also present a novel analysis of the variance of our estimate compared to commonly used importance sampling estimators such as V-trace. Our approach improves sample complexity as well as final performance and ensures consistent and robust performance across various benchmark tasks. Eliminating the state-action-value function in Vlearn facilitates a streamlined learning process, yielding high-return agents.

ICLR Conference 2025 Conference Paper

End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler

  • Denis Blessing
  • Xiaogang Jia
  • Gerhard Neumann

Diffusion models optimized via variational inference (VI) have emerged as a promising tool for generating samples from unnormalized target densities. These models create samples by simulating a stochastic differential equation, starting from a simple, tractable prior, typically a Gaussian distribution. However, when the support of this prior differs greatly from that of the target distribution, diffusion models often struggle to explore effectively or suffer from large discretization errors. Moreover, learning the prior distribution can lead to mode-collapse, exacerbated by the mode-seeking nature of reverse Kullback-Leibler divergence commonly used in VI. To address these challenges, we propose end-to-end learnable Gaussian mixture priors (GMPs). GMPs offer improved control over exploration, adaptability to target support, and increased expressiveness to counteract mode collapse. We further leverage the structure of mixture models by proposing a strategy to iteratively refine the model through the addition of mixture components during training. Our experimental results demonstrate significant performance improvements across a diverse range of real-world and synthetic benchmark problems when using GMPs without requiring additional target evaluations.

ICLR Conference 2025 Conference Paper

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

  • Tai Hoang
  • Huy Le
  • Philipp Becker
  • Ngo Anh Vien
  • Gerhard Neumann

Manipulating objects with varying geometries and deformable objects is a major challenge in robotics. Tasks such as insertion with different objects or cloth hanging require precise control and effective modelling of complex dynamics. In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs, such as actuators and objects, accompanied by different edge types describing their interactions. This graph representation serves as a unified structure for both rigid and deformable objects tasks, and can be extended further to tasks comprising multiple actuators. To evaluate this setup, we present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects, as well as rope and cloth manipulation with multiple end-effectors. These tasks present a large search space, as both the initial and target configurations are uniformly sampled in 3D space. To address this issue, we propose a novel graph-based policy model, dubbed Heterogeneous Equivariant Policy (HEPi), utilizing $SE(3)$ equivariant message passing networks as the main backbone to exploit the geometric symmetry. In addition, by modeling explicit heterogeneity, HEPi can outperform Transformer-based and non-heterogeneous equivariant policies in terms of average returns, sample efficiency, and generalization to unseen objects. Our project page is available at https://thobotics.github.io/hepi.

EWRL Workshop 2025 Workshop Paper

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

  • Tai Hoang
  • Huy Le
  • Philipp Becker
  • Vien Anh Ngo
  • Gerhard Neumann

Manipulating objects with varying geometries and deformable objects is a major challenge in robotics. Tasks such as insertion with different objects or cloth hanging require precise control and effective modelling of complex dynamics. In this work, we frame this problem through the lens of a heterogeneous graph that comprises smaller sub-graphs, such as actuators and objects, accompanied by different edge types describing their interactions. This graph representation serves as a unified structure for both rigid and deformable objects tasks, and can be extended further to tasks comprising multiple actuators. To evaluate this setup, we present a novel and challenging reinforcement learning benchmark, including rigid insertion of diverse objects, as well as rope and cloth manipulation with multiple end-effectors. These tasks present a large search space, as both the initial and target configurations are uniformly sampled in 3D space. To address this issue, we propose a novel graph-based policy model, dubbed Heterogeneous Equivariant Policy (HEPi), utilizing $SE(3)$ equivariant message passing networks as the main backbone to exploit the geometric symmetry. In addition, by modeling explicit heterogeneity, HEPi can outperform Transformer-based and non-heterogeneous equivariant policies in terms of average returns, sample efficiency, and generalization to unseen objects.

NeurIPS Conference 2025 Conference Paper

MaNGO — Adaptable Graph Network Simulators via Meta-Learning

  • Philipp Dahlinger
  • Tai Hoang
  • Denis Blessing
  • Niklas Freymuth
  • Gerhard Neumann

Accurately simulating physics is crucial across scientific domains, with applications spanning from robotics to materials science. While traditional mesh-based simulations are precise, they are often computationally expensive and require knowledge of physical parameters, such as material properties. In contrast, data-driven approaches like Graph Network Simulators (GNSs) offer faster inference but suffer from two key limitations: Firstly, they must be retrained from scratch for even minor variations in physical parameters, and secondly they require labor-intensive data collection for each new parameter setting. This is inefficient, as simulations with varying parameters often share a common underlying latent structure. In this work, we address these challenges by learning this shared structure through meta-learning, enabling fast adaptation to new physical parameters without retraining. To this end, we propose a novel architecture that generates a latent representation by encoding graph trajectories using conditional neural processes (CNPs). To mitigate error accumulation over time, we combine CNPs with a novel neural operator architecture. We validate our approach, Meta Neural Graph Operator (MaNGO), on several dynamics prediction tasks with varying material properties, demonstrating superior performance over existing GNS methods. Notably, MaNGO achieves accuracy on unseen material properties close to that of an oracle model.

NeurIPS Conference 2025 Conference Paper

PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning

  • Xiaogang Jia
  • Qian Wang
  • Anrui Wang
  • Han Wang
  • Balázs Gyenes
  • Emiliyan Gospodinov
  • Xinkai Jiang
  • Ge Li

Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https: //point-map. github. io/Point-Map/

NeurIPS Conference 2025 Conference Paper

Scaffolding Dexterous Manipulation with Vision-Language Models

  • Vincent de Bakker
  • Joey Hejna
  • Tyler Lum
  • Onur Celik
  • Aleksandar Taranovic
  • Denis Blessing
  • Gerhard Neumann
  • Jeannette Bohg

Dexterous robotic hands are essential for performing complex manipulation tasks, yet remain difficult to train due to the challenges of demonstration collection and high-dimensional control. While reinforcement learning (RL) can alleviate the data bottleneck by generating experience in simulation, it typically relies on carefully designed, task-specific reward functions, which hinder scalability and generalization. Thus, contemporary works in dexterous manipulation have often bootstrapped from reference trajectories. These trajectories specify target hand poses that guide the exploration of RL policies and object poses that enable dense, task-agnostic rewards. However, sourcing suitable trajectories---particularly for dexterous hands---remains a significant challenge. Yet, the precise details in explicit reference trajectories are often unnecessary, as RL ultimately refines the motion. Our key insight is that modern vision-language models (VLMs) already encode the commonsense spatial and semantic knowledge needed to specify tasks and guide exploration effectively. Given a task description (e. g. , “open the cabinet”) and a visual scene, our method uses an off-the-shelf VLM to first identify task-relevant keypoints (e. g. , handles, buttons) and then synthesize 3D trajectories for hand motion and object motion. Subsequently, we train a low-level residual RL policy in simulation to track these coarse trajectories or ``scaffolds'' with high fidelity. Across a number of simulated tasks involving articulated objects and semantic understanding, we demonstrate that our method is able to learn robust dexterous manipulation policies. Moreover, we showcase that our method transfers to real-world robotic hands without any human demonstrations or handcrafted rewards.

ICLR Conference 2025 Conference Paper

Sequential Controlled Langevin Diffusions

  • Junhua Chen
  • Lorenz Richter
  • Julius Berner
  • Denis Blessing
  • Gerhard Neumann
  • Anima Anandkumar

An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive annealed densities via prescribed Markov chains and resampling steps, and (2) recently developed diffusion-based sampling methods, where a learned dynamical transport is used. Despite the common goal, both approaches have different, often complementary, advantages and drawbacks. The resampling steps in SMC allow focusing on promising regions of the space, often leading to robust performance. While the algorithm enjoys asymptotic guarantees, the lack of flexible, learnable transitions can lead to slow convergence. On the other hand, diffusion-based samplers are learned and can potentially better adapt themselves to the target at hand, yet often suffer from training instabilities. In this work, we present a principled framework for combining SMC with diffusion-based samplers by viewing both methods in continuous time and considering measures on path space. This culminates in the new Sequential Controlled Langevin Diffusion (SCLD) sampling method, which is able to utilize the benefits of both methods and reaches improved performance on multiple benchmark problems, in many cases using only 10% of the training budget of previous diffusion-based samplers.

ICLR Conference 2025 Conference Paper

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

  • Ge Li
  • Dong Tian
  • Hongyi Zhou
  • Xinkai Jiang
  • Rudolf Lioutikov
  • Gerhard Neumann

This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL), a novel algorithm that enables off-policy updates in the ERL framework. In ERL, policies predict entire action trajectories over multiple time steps instead of single actions at every time step. These trajectories are typically parameterized by trajectory generators such as Movement Primitives (MP), allowing for smooth and efficient exploration over long horizons while capturing high-level temporal correlations. However, ERL methods are often constrained to on-policy frameworks due to the difficulty of evaluating state-action values for entire action sequences, limiting their sample efficiency and preventing the use of more efficient off-policy architectures. TOP-ERL addresses this shortcoming by segmenting long action sequences and estimating the state-action values for each segment using a transformer-based critic architecture alongside an n-step return estimation. These contributions result in efficient and stable training that is reflected in the empirical results conducted on sophisticated robot learning environments. TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance.

TMLR Journal 2025 Journal Article

Towards Measuring Predictability: To which extent data-driven approaches can extract deterministic relations from data exemplified with time series prediction and classification

  • Saleh GHOLAM ZADEH
  • Vaisakh Shaj
  • Patrick Jahnke
  • Gerhard Neumann
  • Tim Breitenbach

Minimizing loss functions is one important ingredient for machine learning to fit parameters such that the machine learning models extract relations hidden in the data. The smaller the loss function value on various splittings of a dataset, the better the machine learning model is assumed to perform. However, datasets are usually generated by dynamics consisting of deterministic components, where relations are clearly defined and consequently learnable, as well as stochastic parts where outcomes are random and thus not predictable. Depending on the amplitude of the deterministic and stochastic processes, the best achievable loss function value varies and is usually not known in real data science scenarios. In this research, a statistical framework is developed that provides measures to address the predictability of a target given the available input data and, after training a machine learning model, how much of the deterministic relations have been missed by the model. Consequently, the presented framework allows to differentiate model errors into unpredictable parts regarding the given input and a systematic miss of deterministic relations. The work extends the definition of model success or failure as well as the convergence of a training process. Moreover, it is demonstrated how such measures can enrich the procedure of model training. The framework is showcased with time series data on different synthetic and real-world datasets. The code is available at https://github.com/Saleh-Gholam-Zadeh/predictability_measure.

NeurIPS Conference 2025 Conference Paper

Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference

  • Denis Blessing
  • Julius Berner
  • Lorenz Richter
  • Carles Domingo i Enrich
  • Yuanqi Du
  • Arash Vahdat
  • Gerhard Neumann

Solving stochastic optimal control problems with quadratic control costs can be viewed as approximating a target path space measure, e. g. via gradient-based optimization. In practice, however, this optimization is challenging in particular if the target measure differs substantially from the prior. In this work, we therefore approach the problem by iteratively solving constrained problems incorporating trust regions that aim for approaching the target measure gradually in a systematic way. It turns out that this trust region based strategy can be understood as a geometric annealing from the prior to the target measure, where, however, the incorporated trust regions lead to a principled and educated way of choosing the time steps in the annealing path. We demonstrate in multiple optimal control applications that our novel method can improve performance significantly, including tasks in diffusion-based sampling and fine-tuning of diffusion models.

ICLR Conference 2025 Conference Paper

Underdamped Diffusion Bridges with Applications to Sampling

  • Denis Blessing
  • Julius Berner
  • Lorenz Richter
  • Gerhard Neumann

We provide a general framework for learning diffusion bridges that transport prior to target distributions. It includes existing diffusion models for generative modeling, but also underdamped versions with degenerate diffusion matrices, where the noise only acts in certain dimensions. Extending previous findings, our framework allows to rigorously show that score-matching in the underdamped case is indeed equivalent to maximizing a lower bound on the likelihood. Motivated by superior convergence properties and compatibility with sophisticated numerical integration schemes of underdamped stochastic processes, we propose *underdamped diffusion bridges*, where a general density evolution is learned rather than prescribed by a fixed noising process. We apply our method to the challenging task of sampling from unnormalized densities without access to samples from the target distribution. Across a diverse range of sampling problems, our approach demonstrates state-of-the-art performance, notably outperforming alternative methods, while requiring significantly fewer discretization steps and almost no hyperparameter tuning.

NeurIPS Conference 2024 Conference Paper

A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics

  • Puze Liu
  • Jonas Günster
  • Niklas Funk
  • Simon Gröger
  • Dong Chen
  • Haitham Bou-Ammar
  • Julius Jankowski
  • Ante Marić

Machine learning methods have a groundbreaking impact in many application domains, but their application on real robotic platforms is still limited. Despite the many challenges associated with combining machine learning technology with robotics, robot learning remains one of the most promising directions for enhancing the capabilities of robots. When deploying learning-based approaches on real robots, extra effort is required to address the challenges posed by various real-world factors. To investigate the key factors influencing real-world deployment and to encourage original solutions from different researchers, we organized the Robot Air Hockey Challenge at the NeurIPS 2023 conference. We selected the air hockey task as a benchmark, encompassing low-level robotics problems and high-level tactics. Different from other machine learning-centric benchmarks, participants need to tackle practical challenges in robotics, such as the sim-to-real gap, low-level control issues, safety problems, real-time requirements, and the limited availability of real-world data. Furthermore, we focus on a dynamic environment, removing the typical assumption of quasi-static motions of other real-world benchmarks. The competition's results show that solutions combining learning-based approaches with prior knowledge outperform those relying solely on data when real-world deployment is challenging. Our ablation study reveals which real-world factors may be overlooked when building a learning-based solution. The successful real-world air hockey deployment of best-performing agents sets the foundation for future competitions and follow-up research directions.

ICML Conference 2024 Conference Paper

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

  • Onur Celik
  • Aleksandar Taranovic
  • Gerhard Neumann

Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose Diverse Skill Learning (Di-SkilL), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment’s unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.

ICML Conference 2024 Conference Paper

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

  • Denis Blessing
  • Xiaogang Jia
  • Johannes Esslinger
  • Francisco Vargas 0001
  • Gerhard Neumann

Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments.

RLC Conference 2024 Conference Paper

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

  • Philipp Becker
  • Sebastian Mossburger
  • Fabian Otto
  • Gerhard Neumann

Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently, based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and allow for solving tasks that are out of reach for more naive representation learning approaches and other recent baselines.

RLJ Journal 2024 Journal Article

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

  • Philipp Becker
  • Sebastian Mossburger
  • Fabian Otto
  • Gerhard Neumann

Learning self-supervised representations using reconstruction or contrastive losses improves performance and sample complexity of image-based and multimodal reinforcement learning (RL). Here, different self-supervised loss functions have distinct advantages and limitations depending on the information density of the underlying sensor modality. Reconstruction provides strong learning signals but is susceptible to distractions and spurious information. While contrastive approaches can ignore those, they may fail to capture all relevant details and can lead to representation collapse. For multimodal RL, this suggests that different modalities should be treated differently, based on the amount of distractions in the signal. We propose Contrastive Reconstructive Aggregated representation Learning (CoRAL), a unified framework enabling us to choose the most appropriate self-supervised loss for each sensor modality and allowing the representation to better focus on relevant aspects. We evaluate CoRAL's benefits on a wide range of tasks with images containing distractions or occlusions, a new locomotion suite, and a challenging manipulation suite with visually realistic distractions. Our results show that learning a multimodal representation by combining contrastive and reconstruction-based losses can significantly improve performance and allow for solving tasks that are out of reach for more naive representation learning approaches and other recent baselines.

TMLR Journal 2024 Journal Article

Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

  • Andreas Boltres
  • Niklas Freymuth
  • Patrick Jahnke
  • Holger Karl
  • Gerhard Neumann

Finding efficient routes for data packets is an essential task in computer networking. The optimal routes depend greatly on the current network topology, state and traffic demand, and they can change within milliseconds. Reinforcement Learning can help to learn network representations that provide routing decisions for possibly novel situations. So far, this has commonly been done using fluid network models. We investigate their suitability for millisecond-scale adaptations with a range of traffic mixes and find that packet-level network models are necessary to capture true dynamics, in particular in the presence of TCP traffic. To this end, we present PackeRL, the first packet-level Reinforcement Learning environment for routing in generic network topologies. Our experiments confirm that learning-based strategies that have been trained in fluid environments do not generalize well to this more realistic, but more challenging setup. Hence, we also introduce two new algorithms for learning sub-second Routing Optimization. We present M-Slim, a dynamic shortest-path algorithm that excels at high traffic volumes but is computationally hard to scale to large network topologies, and FieldLines, a novel next-hop policy design that re-optimizes routing for any network topology within milliseconds without requiring any re-training. Both algorithms outperform current learning-based approaches as well as commonly used static baseline protocols, particularly in high-traffic volume scenarios. All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.

ICRA Conference 2024 Conference Paper

Lens Capsule Tearing in Cataract Surgery using Reinforcement Learning

  • Rebekka Charlotte Peter
  • Steffen Peikert
  • Ludwig Haide
  • Doan Xuan Viet Pham
  • Tahar Chettaoui
  • Eleonora Tagliabue
  • Paul Maria Scheikl
  • Johannes Fauser

Cataract is the leading cause of blindness worldwide with an increasing number of patients due to changing demographics, making automation an important part in future surgical treatment. In this work, we focus on a substep of cataract surgery, the Continuous Curvilinear Capsulorhexis (CCC). With a high complexity, this task is an ideal candidate for Reinforcement Learning (RL) in simulation. First, we present an interactive and physically realistic simulation based on the Finite Element Method (FEM) that mimics the tearing behavior of soft tissue during CCC. Then, we train and evaluate RL models in simulation, demonstrating that the trained policies can complete the CCC in 85% of cases. We also show that applying domain randomization techniques make the policy more robust against changes in geometrical and biomechanical boundary conditions.

IROS Conference 2024 Conference Paper

MuTT: A Multimodal Trajectory Transformer for Robot Skills

  • Claudius Kienle
  • Benjamin Alt
  • Onur Celik
  • Philipp Becker
  • Darko Katic
  • Rainer Jäkel
  • Gerhard Neumann

High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills’ parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose Multimodal Trajectory Transformer (MuTT), a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills by integrating vision, trajectory, and robot skill parameters. Notably, we pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. Furthermore, we illustrate MuTT’s efficacy as a predictor when combined with a model-based robot skill optimizer. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions during optimization. Designed for compatibility with any representation of robot skills, MuTT demonstrates its versatility across three comprehensive experiments, showcasing superior performance across two different skill representations.

ICLR Conference 2024 Conference Paper

Neural Contractive Dynamical Systems

  • Hadi Beik-Mohammadi
  • Søren Hauberg
  • Georgios Arvanitidis
  • Nadia Figueroa
  • Gerhard Neumann
  • Leonel Rozo

Stability guarantees are crucial when ensuring that a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn \emph{neural contractive dynamical systems}, where our neural architecture ensures contraction, and hence, global stability. To efficiently scale the method to high-dimensional dynamical systems, we develop a variant of the variational autoencoder that learns dynamics in a low-dimensional latent representation space while retaining contractive stability after decoding. We further extend our approach to learning contractive systems on the Lie group of rotations to account for full-pose end-effector dynamic motions. The result is the first highly flexible learning architecture that provides contractive stability guarantees with capability to perform obstacle avoidance. Empirically, we demonstrate that our approach encodes the desired dynamics more accurately than the current state-of-the-art, which provides less strong stability guarantees.

ICLR Conference 2024 Conference Paper

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

  • Ge Li
  • Hongyi Zhou
  • Dominik Roth
  • Serge Thilges
  • Fabian Otto
  • Rudolf Lioutikov
  • Gerhard Neumann

Current advancements in reinforcement learning (RL) have predominantly focused on learning step-based policies that generate actions for each perceived state. While these methods efficiently leverage step information from environmental interaction, they often ignore the temporal correlation between actions, resulting in inefficient exploration and unsmooth trajectories that are challenging to implement on real hardware. Episodic RL (ERL) seeks to overcome these challenges by exploring in parameters space that capture the correlation of actions. However, these approaches typically compromise data efficiency, as they treat trajectories as opaque black boxes. In this work, we introduce a novel ERL algorithm, Temporally-Correlated Episodic RL (TCE), which effectively utilizes step information in episodic policy updates, opening the 'black box' in existing ERL methods while retaining the smooth and consistent exploration in parameter space. TCE synergistically combines the advantages of step-based and episodic RL, achieving comparable performance to recent ERL methods while maintaining data efficiency akin to state-of-the-art (SoTA) step-based RL.

JMLR Journal 2024 Journal Article

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning

  • Maximilian Hüttenrauch
  • Gerhard Neumann

Black-box optimization is a versatile approach to solve complex problems where the objective function is not explicitly known and no higher order information is available. Due to its general nature, it finds widespread applications in function optimization as well as machine learning, especially episodic reinforcement learning tasks. While traditional black-box optimizers like CMA-ES may falter in noisy scenarios due to their reliance on ranking-based transformations, a promising alternative emerges in the form of the Model-based Relative Entropy Stochastic Search (MORE) algorithm. MORE can be derived from natural policy gradients and compatible function approximation and directly optimizes the expected fitness without resorting to rankings. However, in its original formulation, MORE often cannot achieve state of the art performance. In this paper, we improve MORE by decoupling the update of the search distribution's mean and covariance and an improved entropy scheduling technique based on an evolution path resulting in faster convergence, and a simplified model learning approach in comparison to the original paper. We show that our algorithm performs comparable to state-of-the-art black-box optimizers on standard benchmark functions. Further, it clearly outperforms ranking-based methods and other policy-gradient based black-box algorithms as well as state of the art deep reinforcement learning algorithms when used for episodic reinforcement learning tasks. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

ICLR Conference 2024 Conference Paper

Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations

  • Xiaogang Jia
  • Denis Blessing
  • Xinkai Jiang
  • Moritz Reuss
  • Atalay Donat
  • Rudolf Lioutikov
  • Gerhard Neumann

Imitation learning with human data has demonstrated remarkable success in teaching robots in a wide range of skills. However, the inherent diversity in human behavior leads to the emergence of multi-modal data distributions, thereby presenting a formidable challenge for existing imitation learning algorithms. Quantifying a model's capacity to capture and replicate this diversity effectively is still an open problem. In this work, we introduce simulation benchmark environments and the corresponding *Datasets with Diverse human Demonstrations for Imitation Learning (D3IL)*, designed explicitly to evaluate a model's ability to learn multi-modal behavior. Our environments are designed to involve multiple sub-tasks that need to be solved, consider manipulation of multiple objects which increases the diversity of the behavior and can only be solved by policies that rely on closed loop sensory feedback. Other available datasets are missing at least one of these challenging properties. To address the challenge of diversity quantification, we introduce tractable metrics that provide valuable insights into a model's ability to acquire and reproduce diverse behaviors. These metrics offer a practical means to assess the robustness and versatility of imitation learning algorithms. Furthermore, we conduct a thorough evaluation of state-of-the-art methods on the proposed task suite. This evaluation serves as a benchmark for assessing their capability to learn diverse behaviors. Our findings shed light on the effectiveness of these methods in tackling the intricate problem of capturing and generalizing multi-modal human behaviors, offering a valuable reference for the design of future imitation learning algorithms.

NeurIPS Conference 2024 Conference Paper

Variational Distillation of Diffusion Policies into Mixture of Experts

  • Hongyi Zhou
  • Denis Blessing
  • Ge Li
  • Onur Celik
  • Xiaogang Jia
  • Gerhard Neumann
  • Rudolf Lioutikov

This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to replicate the inherent diversity in human behavior, making them the preferred models in behavior learning such as Learning from Human Demonstrations (LfD). However, diffusion models come with some drawbacks, including the intractability of likelihoods and long inference times due to their iterative sampling process. The inference times, in particular, pose a significant challenge to real-time applications such as robot control. In contrast, MoEs effectively address the aforementioned issues while retaining the ability to represent complex distributions but are notoriously difficult to train. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models. Specifically, VDD leverages a decompositional upper bound of the variational objective that allows the training of each expert separately, resulting in a robust optimization scheme for MoEs. VDD demonstrates across nine complex behavior learning tasks, that it is able to: i) accurately distill complex distributions learned by the diffusion model, ii) outperform existing state-of-the-art distillation methods, and iii) surpass conventional methods for training MoE. The code and videos are available at https: //intuitive-robots. github. io/vdd-website.

TMLR Journal 2023 Journal Article

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

  • Oleg Arenz
  • Philipp Dahlinger
  • Zihan Ye
  • Michael Volpp
  • Gerhard Neumann

Variational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We argue that for both approaches, the quality of the learned approximations can heavily suffer from the respective design choices: By updating the individual components using samples from the mixture model, iBayes-GMM often fails to produce meaningful updates to low-weight components, and by using a zero-order method for estimating the natural gradient, VIPS scales badly to higher-dimensional problems. Furthermore, we show that information-geometric trust-regions (used by VIPS) are effective even when using first-order natural gradient estimates, and often outperform the improved Bayesian learning rule (iBLR) update used by iBayes-GMM. We systematically evaluate the effects of design choices and show that a hybrid approach significantly outperforms both prior works. Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports $432$ different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner.

ICLR Conference 2023 Conference Paper

Accurate Bayesian Meta-Learning by Accurate Task Posterior Inference

  • Michael Volpp
  • Philipp Dahlinger
  • Philipp Becker
  • Christian G. Daniel
  • Gerhard Neumann

Bayesian meta-learning (BML) enables fitting expressive generative models to small datasets by incorporating inductive priors learned from a set of related tasks. The Neural Process (NP) is a prominent deep neural network-based BML architecture, which has shown remarkable results in recent years. In its standard formulation, the NP encodes epistemic uncertainty in an amortized, factorized, Gaussian variational (VI) approximation to the BML task posterior (TP), using reparametrized gradients. Prior work studies a range of architectural modifications to boost performance, such as attentive computation paths or improved context aggregation schemes, while the influence of the VI scheme remains under-explored. We aim to bridge this gap by introducing GMM-NP, a novel BML model, which builds on recent work that enables highly accurate, full-covariance Gaussian mixture (GMM) TP approximations by combining VI with natural gradients and trust regions. We show that GMM-NP yields tighter evidence lower bounds, which increases the efficiency of marginal likelihood optimization, leading to improved epistemic uncertainty estimation and accuracy. GMM-NP does not require complex architectural modifications, resulting in a powerful, yet conceptually simple BML model, which outperforms the state of the art on a range of challenging experiments, highlighting its applicability to settings where data is scarce.

ICLR Conference 2023 Conference Paper

Adversarial Imitation Learning with Preferences

  • Aleksandar Taranovic
  • Andras G. Kupcsik
  • Niklas Freymuth
  • Gerhard Neumann

Designing an accurate and explainable reward function for many Reinforcement Learning tasks is a cumbersome and tedious process. Instead, learning policies directly from the feedback of human teachers naturally integrates human domain knowledge into the policy optimization process. However, different feedback modalities, such as demonstrations and preferences, provide distinct benefits and disadvantages. For example, demonstrations convey a lot of information about the task but are often hard or costly to obtain from real experts while preferences typically contain less information but are in most cases cheap to generate. However, existing methods centered around human feedback mostly focus on a single teaching modality, causing them to miss out on important training data while making them less intuitive to use. In this paper we propose a novel method for policy learning that incorporates two different feedback types, namely \emph{demonstrations} and \emph{preferences}. To this end, we make use of the connection between discriminator training and density ratio estimation to incorporate preferences into the popular Adversarial Imitation Learning paradigm. This insight allows us to express loss functions over both demonstrations and preferences in a unified framework. Besides expert demonstrations, we are also able to learn from imperfect ones and combine them with preferences to achieve improved task performance. We experimentally validate the effectiveness of combining both preferences and demonstrations on common benchmarks and also show that our method can efficiently learn challenging robot manipulation tasks.

NeurIPS Conference 2023 Conference Paper

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift

  • Florian Seligmann
  • Philipp Becker
  • Michael Volpp
  • Gerhard Neumann

Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or underconfident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.

ICRA Conference 2023 Conference Paper

Curriculum-Based Imitation of Versatile Skills

  • Maximilian Xiling Li
  • Onur Celik
  • Philipp Becker
  • Denis Blessing
  • Rudolf Lioutikov
  • Gerhard Neumann

Learning skills by imitation is a promising concept for the intuitive teaching of robots. A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations. Yet, human demonstrations are often multi-modal, i. e. , the same task is solved in multiple ways which is a major challenge for most imitation learning methods that are based on such a maximum likelihood (ML) objective. The ML objective forces the model to cover all data, it prevents specialization in the context space and can cause mode-averaging in the behavior space, leading to suboptimal or potentially catastrophic behavior. Here, we alleviate those issues by introducing a curriculum using a weight for each data point, allowing the model to specialize on data it can represent while incentivizing it to cover as much data as possible by an entropy bonus. We extend our algorithm to a Mixture of (linear) Experts (MoE) such that the single components can specialize on local context regions, while the MoE covers all data points. We evaluate our approach in complex simulated and real robot control tasks and show it learns from versatile human demonstrations and significantly outperforms current SOTA methods. 1 1 A reference implementation can be found at https://github.com/intuitive-robots/ML-Cur

ICLR Conference 2023 Conference Paper

Grounding Graph Network Simulators using Physical Sensor Observations

  • Jonas Linkerhägner
  • Niklas Freymuth
  • Paul Maria Scheikl
  • Franziska Mathis-Ullrich
  • Gerhard Neumann

Physical simulations that accurately model reality are crucial for many engineering disciplines such as mechanical engineering and robotic motion planning. In recent years, learned Graph Network Simulators produced accurate mesh-based simulations while requiring only a fraction of the computational cost of traditional simulators. Yet, the resulting predictors are confined to learning from data generated by existing mesh-based simulators and thus cannot include real world sensory information such as point cloud data. As these predictors have to simulate complex physical systems from only an initial state, they exhibit a high error accumulation for long-term predictions. In this work, we integrate sensory information to ground Graph Network Simulators on real world observations. In particular, we predict the mesh state of deformable objects by utilizing point cloud data. The resulting model allows for accurate predictions over longer time horizons, even under uncertainties in the simulation, such as unknown material properties. Since point clouds are usually not available for every time step, especially in online settings, we employ an imputation-based model. The model can make use of such additional information only when provided, and resorts to a standard Graph Network Simulator, otherwise. We experimentally validate our approach on a suite of prediction tasks for mesh-based interactions between soft and rigid bodies. Our method results in utilization of additional point cloud information to accurately predict stable simulations where existing Graph Network Simulators fail.

NeurIPS Conference 2023 Conference Paper

Information Maximizing Curriculum: A Curriculum-Based Approach for Learning Versatile Skills

  • Denis Blessing
  • Onur Celik
  • Xiaogang Jia
  • Moritz Reuss
  • Maximilian Li
  • Rudolf Lioutikov
  • Gerhard Neumann

Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leadsto multimodal distributions because of the variability in human actions. Mostimitation learning methods rely on a maximum likelihood (ML) objective to learna parameterized policy, but this can result in suboptimal or unsafe behavior dueto the mode-averaging property of the ML objective. In this work, we proposeInformation Maximizing Curriculum, a curriculum-based approach that assignsa weight to each data point and encourages the model to specialize in the data itcan represent, effectively mitigating the mode-averaging problem by allowing themodel to ignore data from modes it cannot represent. To cover all modes and thus, enable versatile behavior, we extend our approach to a mixture of experts (MoE)policy, where each mixture component selects its own subset of the training datafor learning. A novel, maximum entropy-based objective is proposed to achievefull coverage of the dataset, thereby enabling the policy to encompass all modeswithin the data distribution. We demonstrate the effectiveness of our approach oncomplex simulated control tasks using versatile human demonstrations, achievingsuperior performance compared to state-of-the-art methods.

JMLR Journal 2023 Journal Article

LapGym - An Open Source Framework for Reinforcement Learning in Robot-Assisted Laparoscopic Surgery

  • Paul Maria Scheikl
  • Balázs Gyenes
  • Rayan Younis
  • Christoph Haas
  • Gerhard Neumann
  • Martin Wagner
  • Franziska Mathis-Ullrich

Recent advances in reinforcement learning (RL) have increased the promise of introducing cognitive assistance and automation to robot-assisted laparoscopic surgery (RALS). However, progress in algorithms and methods depends on the availability of standardized learning environments that represent skills relevant to RALS. We present LapGym, a framework for building RL environments for RALS that models the challenges posed by surgical tasks, and sofaenv, a diverse suite of 12 environments. Motivated by surgical training, these environments are organized into 4 tracks: Spatial Reasoning, Deformable Object Manipulation & Grasping, Dissection, and Thread Manipulation. Each environment is highly parametrizable for increasing difficulty, resulting in a high performance ceiling for new algorithms. We use Proximal Policy Optimization (PPO) to establish a baseline for model-free RL algorithms, investigating the effect of several environment parameters on task difficulty. Finally, we show that many environments and parameter configurations reflect well-known, open problems in RL research, allowing researchers to continue exploring these fundamental problems in a surgical context. We aim to provide a challenging, standard environment suite for further development of RL for RALS, ultimately helping to realize the full potential of cognitive surgical robotics. LapGym is publicly accessible through GitHub (https://github.com/ScheiklP/lap_gym). [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Multi Time Scale World Models

  • Vaisakh Shaj Kumar
  • Saleh GHOLAM ZADEH
  • Ozan Demir
  • Luiz Douat
  • Gerhard Neumann

Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems. Code is available at this repository: https: //github. com/ALRhub/MTS3.

NeurIPS Conference 2023 Conference Paper

Swarm Reinforcement Learning for Adaptive Mesh Refinement

  • Niklas Freymuth
  • Philipp Dahlinger
  • Tobias Würth
  • Simon Reisch
  • Luise Kärger
  • Gerhard Neumann

The Finite Element Method, an important technique in engineering, is aided by Adaptive Mesh Refinement (AMR), which dynamically refines mesh regions to allow for a favorable trade-off between computational speed and simulation accuracy. Classical methods for AMR depend on task-specific heuristics or expensive error estimators, hindering their use for complex simulations. Recent learned AMR methods tackle these problems, but so far scale only to simple toy examples. We formulate AMR as a novel Adaptive Swarm Markov Decision Process in which a mesh is modeled as a system of simple collaborating agents that may split into multiple new agents. This framework allows for a spatial reward formulation that simplifies the credit assignment problem, which we combine with Message Passing Networks to propagate information between neighboring mesh elements. We experimentally validate the effectiveness of our approach, Adaptive Swarm Mesh Refinement (ASMR), showing that it learns reliable, scalable, and efficient refinement strategies on a set of challenging problems. Our approach significantly speeds up computation, achieving up to 30-fold improvement compared to uniform refinements in complex simulations. Additionally, we outperform learned baselines and achieve a refinement quality that is on par with a traditional error-based AMR strategy without expensive oracle information about the error signal.

ICLR Conference 2022 Conference Paper

Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

  • Vaisakh Shaj
  • Dieter Büchler
  • Rohit Sonker
  • Philipp Becker
  • Gerhard Neumann

Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and for system identification. However, these models are often based on the assumption that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar, but not identical dynamics, that can be modelled as having a common latent structure. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related state-space models with a low-dimensional set of latent factors. We present a simple and effective way of performing learning and inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real systems and simulations.

ICRA Conference 2022 Conference Paper

Hierarchical Policy Learning for Mechanical Search

  • Oussama Zenkri
  • Ngo Anh Vien
  • Gerhard Neumann

Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like grasping or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) [1] is a framework for object retrieval, which uses a heuristic algorithm for pushing and rule-based algorithms for high-level planning. While rule-based policies profit from human intuition in how they work, they usually perform sub-optimally in many cases. Deep reinforcement learning (RL) has shown great performance in complex tasks such as taking decisions through evaluating pixels, which makes it suitable for training policies in the context of object-retrieval. In this work, we first formulate the MS problem in a principled formulation as a hierarchical POMDP. Based on this formulation, we propose a hierarchical policy learning approach for the MS problem. For demonstration, we present two main parameterized sub-policies: a push policy and an action selection policy. When integrated into the hierarchical POMDP's policy, our proposed sub-policies increase the success rate of retrieving the target object from less than 32% to nearly 80%, while reducing the computation time for push actions from multiple seconds to less than 10 milliseconds.

IROS Conference 2022 Conference Paper

MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network

  • Fabian Duffhauss
  • Tobias Demmler
  • Gerhard Neumann

Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspectives. We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects. We extend this approach by using a combined point cloud from multiple views and fusing the images from each view with a DenseFusion layer. In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner and does not require multiple prediction stages or subsequent fine tuning of the prediction. Furthermore, we present three novel photorealistic datasets of cluttered scenes with heavy occlusions. All of them contain RGB-D images from multiple perspectives and the ground truth for instance semantic segmentation and 6D pose estimation. MV 6D significantly outperforms the state-of-the-art in multi-view 6D pose estimation even in cases where the camera poses are known inaccurately. Furthermore, we show that our approach is robust towards dynamic camera setups and that its accuracy increases incrementally with an increasing number of perspectives.

TMLR Journal 2022 Journal Article

On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

  • Philipp Becker
  • Gerhard Neumann

Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks.

ICRA Conference 2022 Conference Paper

Push-to-See: Learning Non-Prehensile Manipulation to Enhance Instance Segmentation via Deep Q-Learning

  • Baris Serhan
  • Harit Pandya
  • Ayse Küçükyilmaz
  • Gerhard Neumann

Efficient robotic manipulation of objects for sorting and searching often rely upon how well the objects are perceived and the available grasp poses. The challenge arises when the objects are irregular, have similar visual features (e. g. , textureless objects) and the scene is densely cluttered. In such cases, non-prehensile manipulation (e. g. , pushing) can facilitate grasping or searching by improving object perception and singulating the objects from the clutter via physical interaction. The current robotics literature in interactive segmentation focuses solely on isolated cases, where the central aim is on searching or singulating a single target object, or segmenting sparsely cluttered scenes, mainly through matching visual futures in successive scenes before and after the robotic interaction. On the other hand, in this paper, we introduce the first interactive segmentation model in the literature that can autonomously enhance the instance segmentation of such challenging scenes as a whole via optimising a Q-value function that predicts appropriate pushing actions for singulation. We achieved this by training a deep reinforcement learning model with reward signals generated by a Mask-RCNN trained solely on depth images. We evaluated our model in experiments by comparing its success on segmentation quality with a heuristic baseline, as well as the state-of-the-art Visual Pushing and Grasping (VPG) model [1]. Our model significantly outperformed both baselines in all benchmark scenarios. Furthermore, decreasing the segmentation error inherently enabled the autonomous singulation of the scene as a whole. Our evaluation experiments also serve as a benchmark for interactive segmentation research.

IROS Conference 2022 Conference Paper

Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination

  • Abdalkarim Mohtasib
  • Gerhard Neumann
  • Heriberto Cuayáhuitl

Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control policies to solve robotic tasks. In this paper, we propose an algorithm that uses novel techniques to leverage offline expert data using offline and online training to obtain faster convergence and improved performance. The proposed algorithm (AWET) weights the critic losses with a novel agent advantage weight to improve over the expert data. In addition, AWET makes use of an automatic early termination technique to stop and discard policy rollouts that are not similar to expert trajectories-to prevent drifting far from the expert data. In an ablation study, AWET showed improved and promising performance when compared to state-of-the-art baselines on four standard robotic tasks.

ICLR Conference 2021 Conference Paper

Bayesian Context Aggregation for Neural Processes

  • Michael Volpp
  • Fabian Flürenbrock
  • Lukas Großberger
  • Christian G. Daniel
  • Gerhard Neumann

Formulating scalable probabilistic regression models with reliable uncertainty estimates has been a long-standing challenge in machine learning research. Recently, casting probabilistic regression as a multi-task learning problem in terms of conditional latent variable (CLV) models such as the Neural Process (NP) has shown promising results. In this paper, we focus on context aggregation, a central component of such architectures, which fuses information from multiple context data points. So far, this aggregation operation has been treated separately from the inference of a latent representation of the target function in CLV models. Our key contribution is to combine these steps into one holistic mechanism by phrasing context aggregation as a Bayesian inference problem. The resulting Bayesian Aggregation (BA) mechanism enables principled handling of task ambiguity, which is key for efficiently processing context information. We demonstrate on a range of challenging experiments that BA consistently improves upon the performance of traditional mean aggregation while remaining computationally efficient and fully compatible with existing NP-based models.

IROS Conference 2021 Conference Paper

Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning

  • Paul Maria Scheikl
  • Balázs Gyenes
  • Tornike Davitashvili
  • Rayan Younis
  • André Schulze
  • Beat Peter Müller-Stich
  • Gerhard Neumann
  • Martin Wagner 0001

Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team member. Multi-agent policies are trained directly from images in simulation to control multiple instruments in a sub task of the minimally invasive removal of the gallbladder. These agents are evaluated individually and in cooperation with humans to demonstrate their suitability as autonomous assistants. Compared to human teams, the hybrid teams with artificial agents perform better considering completion time (44. 4% to 71. 2% shorter) as well as number of collisions (44. 7% to 98. 0% fewer). Path lengths, however, increase under control of an artificial agent (11. 4% to 33. 5% longer). A multi-agent formulation of the learning problem was favored over a single-agent formulation on this surgical sub task, due to the sequential learning of the two instruments. This approach may be extended to other tasks that are difficult to formulate within the standard reinforcement learning framework. Multi-agent reinforcement learning may shift the paradigm of cognitive robotic surgery towards seamless cooperation between surgeons and assistive technologies.

ICLR Conference 2021 Conference Paper

Differentiable Trust Region Layers for Deep Reinforcement Learning

  • Fabian Otto
  • Philipp Becker
  • Ngo Anh Vien
  • Hanna Ziesche
  • Gerhard Neumann

Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, often lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. The code is available at https://git.io/Jthb0.

IROS Conference 2021 Conference Paper

Residual Feedback Learning for Contact-Rich Manipulation Tasks with Uncertainty

  • Alireza Ranjbar
  • Ngo Anh Vien
  • Hanna Ziesche
  • Joschka Boedecker
  • Gerhard Neumann

While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However, the applicability of such an approach highly depends on the structure of the controller. Often, internal feedback signals of the controller limit an RL algorithm to adequately change the policy and, hence, learn the task. We propose a new formulation that addresses these limitations by also modifying the feedback signals to the controller with an RL policy and show superior performance of our approach on a contact-rich peg-insertion task under position and orientation uncertainty. In addition, we use a recent Cartesian impedance control architecture as the control framework which can be available to us as a black-box while assuming no knowledge about its input/output structure, and show the difficulties of standard RPL. Furthermore, we introduce an adaptive curriculum for the given task to gradually increase the task difficulty in terms of position and orientation uncertainty. A video showing the results can be found at https://youtu.be/SAZm_Krze7U.

ICRA Conference 2020 Conference Paper

Enhancing Grasp Pose Computation in Gripper Workspace Spheres

  • Mohamed Sorour
  • Khaled Elgeneidy
  • Marc Hanheide
  • M. Abdalmjed
  • A. Srinivasan
  • Gerhard Neumann

In this paper, enhancement to the novel grasp planning algorithm based on gripper workspace spheres is presented. Our development requires a registered point cloud of the target from different views, assuming no prior knowledge of the object, nor any of its properties. This work features a new set of metrics for grasp pose candidates evaluation, as well as exploring the impact of high object sampling on grasp success rates. In addition to gripper position sampling, we now perform orientation sampling about the x, y, and z-axes, hence the grasping algorithm no longer require object orientation estimation. Successful experiments have been conducted on a simple jaw gripper (Franka Panda gripper) as well as a complex, high Degree of Freedom (DoF) hand (Allegro hand) as a proof of its versatility. Higher grasp success rates of 76% and 85. 5% respectively has been reported by real world experiments.

ICLR Conference 2020 Conference Paper

Expected Information Maximization: Using the I-Projection for Mixture Density Estimation

  • Philipp Becker
  • Oleg Arenz
  • Gerhard Neumann

Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(nformation)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets.

JMLR Journal 2020 Journal Article

Trust-Region Variational Inference with Gaussian Mixture Models

  • Oleg Arenz
  • Mingjun Zhong
  • Gerhard Neumann

Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

JMLR Journal 2019 Journal Article

Deep Reinforcement Learning for Swarm Systems

  • Maximilian Hüttenrauch
  • Adrian Šošić
  • Gerhard Neumann

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, the observation vector for decentralized decision making is represented by a concatenation of the (local) information an agent gathers about other agents. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions, where we treat the agents as samples and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and neural networks trained end-to-end. We evaluate the representation on two well-known problems from the swarm literature in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents, facilitating the development of complex collective strategies. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

IROS Conference 2019 Conference Paper

Grasping Unknown Objects Based on Gripper Workspace Spheres

  • Mohamed Sorour
  • Khaled Elgeneidy
  • Aravinda Srinivasan
  • Marc Hanheide
  • Gerhard Neumann

In this paper, we present a novel grasp planning algorithm for unknown objects given a registered point cloud of the target from different views. The proposed methodology requires no prior knowledge of the object, nor offline learning. In our approach, the gripper kinematic model is used to generate a point cloud of each finger workspace, which is then filled with spheres. At run-time, first the object is segmented, its major axis is computed, in a plane perpendicular to which, the main grasping action is constrained. The object is then uniformly sampled and scanned for various gripper poses that assure at least one object point is located in the workspace of each finger. In addition, collision checks with the object or the table are performed using computationally inexpensive gripper shape approximation. Our methodology is both time efficient (consumes less than 1. 5 seconds in average) and versatile. Successful experiments have been conducted on a simple jaw gripper (Franka Panda gripper) as well as a complex, high Degree of Freedom (DoF) hand (Allegro hand).

IROS Conference 2019 Conference Paper

Improving Local Trajectory Optimisation using Probabilistic Movement Primitives

  • R. B. Ashith Shyam
  • Peter Lightbody
  • Gautham P. Das
  • Pengcheng Liu 0005
  • Sebastián Gómez-González
  • Gerhard Neumann

Local trajectory optimisation techniques are a powerful tool for motion planning. However, they often get stuck in local optima depending on the quality of the initial solution and consequently, often do not find a valid (i. e. collision free) trajectory. Moreover, they often require fine tuning of a cost function to obtain the desired motions. In this paper, we address both problems by combining local trajectory optimisation with learning from demonstrations. The human expert demonstrates how to reach different target end-effector locations in different ways. From these demonstrations, we estimate a trajectory distribution, represented by a Probabilistic Movement Primitive (ProMP). For a new target location, we sample different trajectories from the ProMP and use these trajectories as initial solutions for the local optimisation. As the ProMP generates versatile initial solutions for the optimisation, the chance of finding poor local minima is significantly reduced. Moreover, the learned trajectory distribution is used to specify the smoothness costs for the optimisation, resulting in solutions of similar shape as the demonstrations. We demonstrate the effectiveness of our approach in several complex obstacle avoidance scenarios.

ICML Conference 2019 Conference Paper

Projections for Approximate Policy Iteration Algorithms

  • Riad Akrour
  • Joni Pajarinen
  • Jan Peters 0001
  • Gerhard Neumann

Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in action distribution. Several approximations exist in the literature to solve this constrained policy update problem. In this paper, we propose to improve over such solutions by introducing a set of projections that transform the constrained problem into an unconstrained one which is then solved by standard gradient descent. Using these projections, we empirically demonstrate that our approach can improve the policy update solution and the control over exploration of existing approximate policy iteration algorithms.

ICML Conference 2019 Conference Paper

Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces

  • Philipp Becker
  • Harit Pandya
  • Gregor H. W. Gebhardt
  • Cheng Zhao 0002
  • C. James Taylor
  • Gerhard Neumann

In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al. , 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference tech- niques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalman filtering which can be learned directly in an end-to-end manner using backpropagation without additional approximations. Our approach uses a high-dimensional factorized latent state representation for which the Kalman updates simplify to scalar operations and thus avoids hard to backpropagate, computationally heavy and potentially unstable matrix inversions. Moreover, we use locally linear dynamic models to efficiently propagate the latent state to the next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any time-series data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent Units (GRUs) (Cho et al. , 2014) while also showing a slightly improved prediction performance and outperforms various recent generative models on an image imputation task.

EWRL Workshop 2018 Workshop Paper

Constraint-Space Projection Direct Policy Search

  • Riad Akrour
  • Jan Peters
  • Gerhard Neumann

Direct policy search usually frames the search distribution update as a constrained maximization of the expected return. The constraint bounds the information loss of the search distribution and is an ad hoc solution to the exploration-exploitation dilemma. In this paper we propose an alternative to the method of Lagrange multipliers to solve the constrained problem. We propose a projection that maps a parametric representation of the search distribution to a search distribution complying with the update constraints. This projection transforms the constrained optimization problem to an unconstrained one which is then solved using standard gradient ascent. We show on a toy optimization problem that the proposed approach finds better solutions and is more robust to small sample counts than two other state-of-the-art approaches that rely on the method of Lagrange multipliers. In a second phase we extend our approach to step-based reinforcement learning and show that one can seamlessly use the tools introduced in this paper to add hard entropy constraints to existing reinforcement learning algorithms.

IROS Conference 2018 Conference Paper

Contact Detection and Size Estimation Using a Modular Soft Gripper with Embedded Flex Sensors

  • Khaled Elgeneidy
  • Gerhard Neumann
  • Simon Pearson
  • Michael R. Jackson
  • Niels Lohse

Grippers made from soft elastomers are able to passively and gently adapt to their targets allowing deformable objects to be grasped safely without causing bruise or damage. However, it is difficult to regulate the contact forces due to the lack of contact feedback for such grippers. In this paper, a modular soft gripper is presented utilizing interchangeable soft pneumatic actuators with embedded flex sensors as fingers of the gripper. The fingers can be assembled in different configurations using 3D printed connectors. The paper investigates the potential of utilizing the simple sensory feedback from the flex and pressure sensors to make additional meaningful inferences regarding the contact state and grasped object size. We study the effect of the grasped object size and contact type on the combined feedback from the embedded flex sensors of opposing fingers. Our results show that a simple linear relationship exists between the grasped object size and the final flex sensor reading at fixed input conditions, despite the variation in object weight and contact type. Additionally, by simply monitoring the time series response from the flex sensor, contact can be detected by comparing the response to the known free-bending response at the same input conditions. Furthermore, by utilizing the measured internal pressure supplied to the soft fingers, it is possible to distinguish between power and pinch grasps, as the contact type affects the rate of change in the flex sensor readings against the internal pressure.

ICML Conference 2018 Conference Paper

Efficient Gradient-Free Variational Inference using Policy Search

  • Oleg Arenz
  • Mingjun Zhong
  • Gerhard Neumann

Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources.

IROS Conference 2018 Conference Paper

Energy-Efficient Design and Control of a Vibro-Driven Robot

  • Pengcheng Liu 0005
  • Gerhard Neumann
  • Qinbing Fu
  • Simon Pearson
  • Hongnian Yu

Vibro-driven robotic (VDR) systems use stick-slip motions for locomotion. Due to the underactuated nature of the system, efficient design and control are still open problems. We present a new energy preserving design based on a spring-augmented pendulum. We indirectly control the friction-induced stick-slip motions by exploiting the passive dynamics in order to achieve an improvement in overall travelling distance and energy efficiency. Both collocated and non-collocated constraint conditions are elaborately analysed and considered to obtain a desired trajectory generation profile. For tracking control, we develop a partial feedback controller for the driving pendulum which counteracts the dynamic contributions from the platform. Comparative simulation studies show the effectiveness and intriguing performance of the proposed approach, while its feasibility is experimentally verified through a physical robot. Our robot is to the best of our knowledge the first nonlinear-motion prototype in literature towards the VDR systems.

ICRA Conference 2018 Conference Paper

Learning Coupled Forward-Inverse Models with Combined Prediction Errors

  • Dorothea Koert
  • Guilherme Maeda
  • Gerhard Neumann
  • Jan Peters 0001

Challenging tasks in unstructured environments require robots to learn complex models. Given a large amount of information, learning multiple simple models can offer an efficient alternative to a monolithic complex network. Training multiple models-that is, learning their parameters and their responsibilities-has been shown to be prohibitively hard as optimization is prone to local minima. To efficiently learn multiple models for different contexts, we thus develop a new algorithm based on expectation maximization (EM). In contrast to comparable concepts, this algorithm trains multiple modules of paired forward-inverse models by using the prediction errors of both forward and inverse models simultaneously. In particular, we show that our method yields a substantial improvement over only considering the errors of the forward models on tasks where the inverse space contains multiple solutions.

ICRA Conference 2018 Conference Paper

Learning Robust Policies for Object Manipulation with Robot Swarms

  • Gregor H. W. Gebhardt
  • Kevin Daun
  • Marius Schnaubelt
  • Gerhard Neumann

Swarm robotics investigates how a large population of robots with simple actuation and limited sensors can collectively solve complex tasks. One particular interesting application with robot swarms is autonomous object assembly. Such tasks have been solved successfully with robot swarms that are controlled by a human operator using a light source. In this paper, we present a method to solve such assembly tasks autonomously based on policy search methods. We split the assembly process in two subtasks: generating a high-level assembly plan and learning a low-level object movement policy. The assembly policy plans the trajectories for each object and the object movement policy controls the trajectory execution. Learning the object movement policy is challenging as it depends on the complex state of the swarm which consists of an individual state for each agent. To approach this problem, we introduce a representation of the swarm which is based on Hilbert space embeddings of distributions. This representation is invariant to the number of agents in the swarm as well as to the allocation of an agent to its position in the swarm. These invariances make the learned policy robust to changes in the swarm and also reduce the search space for the policy search method significantly. We show that the resulting system is able to solve assembly tasks with varying object shapes in multiple simulation scenarios and evaluate the robustness of our representation to changes in the swarm size. Furthermore, we demonstrate that the policies learned in simulation are robust enough to be transferred to real robots.

JMLR Journal 2018 Journal Article

Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

  • Riad Akrour
  • Abbas Abdolmaleki
  • Hany Abdulsamad
  • Jan Peters
  • Gerhard Neumann

Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent \qfunc learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

IROS Conference 2018 Conference Paper

Regularizing Reinforcement Learning with State Abstraction

  • Riad Akrour
  • Filipe Veiga
  • Jan Peters 0001
  • Gerhard Neumann

State abstraction in a discrete reinforcement learning setting clusters states sharing a similar optimal action to yield an easier to solve decision process. In this paper, we generalize the concept of state abstraction to continuous action reinforcement learning by defining an abstract state as a state cluster over which a near-optimal policy of simple shape exists. We propose a hierarchical reinforcement learning algorithm that is able to simultaneously find the state space clustering and the optimal sub-policies in each cluster. The main advantage of the proposed framework is to provide a straightforward way of regularizing reinforcement learning by controlling the behavioral complexity of the learned policy. We apply our algorithm on several benchmark tasks and a robot tactile manipulation task and show that we can match state-of-the-art deep reinforcement learning performance by combining a small number of linear policies.

ICRA Conference 2018 Conference Paper

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

  • Robert Pinsler
  • Riad Akrour
  • Takayuki Osa
  • Jan Peters 0001
  • Gerhard Neumann

While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

ICRA Conference 2017 Conference Paper

A learning-based shared control architecture for interactive task execution

  • Firas Abi-Farraj
  • Takayuki Osa
  • Nicolò Pedemonte
  • Jan Peters 0001
  • Gerhard Neumann
  • Paolo Robuffo Giordano

Shared control is a key technology for various robotic applications in which a robotic system and a human operator are meant to collaborate efficiently. In order to achieve efficient task execution in shared control, it is essential to predict the desired behavior for a given situation or context in order to simplify the control task for the human operator. This prediction is obtained by exploiting Learning from Demonstration (LfD), which is a popular approach for transferring human skills to robots. We encode the demonstrated behavior as trajectory distributions and generalize the learned distributions to new situations. The goal of this paper is to present a shared control framework that uses learned expert distributions to gain more autonomy. Our approach controls the balance between the controller's autonomy and the human preference based on the distributions of the demonstrated trajectories. Moreover, the learned distributions are autonomously refined from collaborative task executions, resulting in a master-slave system with increasing autonomy that requires less user input with an increasing number of task executions. We experimentally validated that our shared control approach enables efficient task executions. Moreover, the conducted experiments demonstrated that the developed system improves its performances through interactive task executions with our shared control.

JMLR Journal 2017 Journal Article

A Survey of Preference-Based Reinforcement Learning Methods

  • Christian Wirth
  • Riad Akrour
  • Gerhard Neumann
  • Johannes Fürnkranz

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chosen reward function. However, designing such a reward function often requires a lot of task- specific prior knowledge. The designer needs to consider different objectives that do not only influence the learned behavior but also the learning progress. To alleviate these issues, preference-based reinforcement learning algorithms (PbRL) have been proposed that can directly learn from an expert's preferences instead of a hand-designed numeric reward. PbRL has gained traction in recent years due to its ability to resolve the reward shaping problem, its ability to learn from non numeric rewards and the possibility to reduce the dependence on expert knowledge. We provide a unified framework for PbRL that describes the task formally and points out the different design principles that affect the evaluation task for the human as well as the computational complexity. The design principles include the type of feedback that is assumed, the representation that is learned to capture the preferences, the optimization problem that has to be solved as well as how the exploration/exploitation problem is tackled. Furthermore, we point out shortcomings of current algorithms, propose open research questions and briefly survey practical tasks that have been solved using PbRL. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

IJCAI Conference 2017 Conference Paper

Contextual Covariance Matrix Adaptation Evolutionary Strategies

  • Abbas Abdolmaleki
  • Bob Price
  • Nuno Lau
  • Luis Paulo Reis
  • Gerhard Neumann

Many stochastic search algorithms are designed to optimize a fixed objective function to learn a task, i. e. , if the objective function changes slightly, for example, due to a change in the situation or context of the task, relearning is required to adapt to the new context. For instance, if we want to learn a kicking movement for a soccer robot, we have to relearn the movement for different ball locations. Such relearning is undesired as it is highly inefficient and many applications require a fast adaptation to a new context/situation. Therefore, we investigate contextual stochastic search algorithms that can learn multiple, similar tasks simultaneously. Current contextual stochastic search methods are based on policy search algorithms and suffer from premature convergence and the need for parameter tuning. In this paper, we extend the well known CMA-ES algorithm to the contextual setting and illustrate its performance on several contextual tasks. Our new algorithm, called contextual CMA-ES, leverages from contextual learning while it preserves all the features of standard CMA-ES such as stability, avoidance of premature convergence, step size control and a minimal amount of parameter tuning.

ICRA Conference 2017 Conference Paper

Empowered skills

  • Alexander Gabriel
  • Riad Akrour
  • Jan Peters 0001
  • Gerhard Neumann

Robot Reinforcement Learning (RL) algorithms return a policy that maximizes a global cumulative reward signal but typically do not create diverse behaviors. Hence, the policy will typically only capture a single solution of a task. However, many motor tasks have a large variety of solutions and the knowledge about these solutions can have several advantages. For example, in an adversarial setting such as robot table tennis, the lack of diversity renders the behavior predictable and hence easy to counter for the opponent. In an interactive setting such as learning from human feedback, an emphasis on diversity gives the human more opportunity for guiding the robot and to avoid the latter to be stuck in local optima of the task. In order to increase diversity of the learned behaviors, we leverage prior work on intrinsic motivation and empowerment. We derive a new intrinsic motivation signal by enriching the description of a task with an outcome space, representing interesting aspects of a sensorimotor stream. For example, in table tennis, the outcome space could be given by the return position and return ball speed. The intrinsic motivation is now given by the diversity of future outcomes, a concept also known as empowerment. We derive a new policy search algorithm that maximizes a trade-off between the extrinsic reward and this intrinsic motivation criterion. Experiments on a planar reaching task and simulated robot table tennis demonstrate that our algorithm can learn a diverse set of behaviors within the area of interest of the tasks.

IROS Conference 2017 Conference Paper

Hybrid control trajectory optimization under uncertainty

  • Joni Pajarinen
  • Ville Kyrki
  • Michael C. Koval
  • Siddhartha S. Srinivasa
  • Jan Peters 0001
  • Gerhard Neumann

Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i. e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

ICRA Conference 2017 Conference Paper

Layered direct policy search for learning hierarchical skills

  • Felix End
  • Riad Akrour
  • Jan Peters 0001
  • Gerhard Neumann

Solutions to real world robotic tasks often require complex behaviors in high dimensional continuous state and action spaces. Reinforcement Learning (RL) is aimed at learning such behaviors but often fails for lack of scalability. To address this issue, Hierarchical RL (HRL) algorithms leverage hierarchical policies to exploit the structure of a task. However, many HRL algorithms rely on task specific knowledge such as a set of predefined sub-policies or sub-goals. In this paper we propose a new HRL algorithm based on information theoretic principles to autonomously uncover a diverse set of sub-policies and their activation policies. Moreover, the learning process mirrors the policys structure and is thus also hierarchical, consisting of a set of independent optimization problems. The hierarchical structure of the learning process allows us to control the learning rate of the sub-policies and the gating individually and add specific information theoretic constraints to each layer to ensure the diversification of the sub-policies. We evaluate our algorithm on two high dimensional continuous tasks and experimentally demonstrate its ability to autonomously discover a rich set of sub-policies.

ICML Conference 2017 Conference Paper

Local Bayesian Optimization of Motor Skills

  • Riad Akrour
  • Dmitry Sorokin
  • Jan Peters 0001
  • Gerhard Neumann

Bayesian optimization is renowned for its sample efficiency but its application to higher dimensional tasks is impeded by its focus on global optimization. To scale to higher dimensional problems, we leverage the sample efficiency of Bayesian optimization in a local context. The optimization of the acquisition function is restricted to the vicinity of a Gaussian search distribution which is moved towards high value areas of the objective. The proposed information-theoretic update of the search distribution results in a Bayesian interpretation of local stochastic search: the search distribution encodes prior knowledge on the optimum’s location and is weighted at each iteration by the likelihood of this location’s optimality. We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning.

JMLR Journal 2017 Journal Article

Non-parametric Policy Search with Limited Information Loss

  • Herke van Hoof
  • Gerhard Neumann
  • Jan Peters

Learning complex control policies from non-linear and redundant sensory input is an important challenge for reinforcement learning algorithms. Non-parametric methods that approximate values functions or transition models can address this problem, by adapting to the complexity of the data set. Yet, many current non-parametric approaches rely on unstable greedy maximization of approximate value functions, which might lead to poor convergence or oscillations in the policy update. A more robust policy update can be obtained by limiting the information loss between successive state-action distributions. In this paper, we develop a policy search algorithm with policy updates that are both robust and non-parametric. Our method can learn non- parametric control policies for infinite horizon continuous Markov decision processes with non-linear and redundant sensory representations. We investigate how we can use approximations of the kernel function to reduce the time requirements of the demanding non-parametric computations. In our experiments, we show the strong performance of the proposed method, and how it can be approximated efficiently. Finally, we show that our algorithm can learn a real-robot under-powered swing-up task directly from image data. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

AAAI Conference 2017 Conference Paper

Policy Search with High-Dimensional Context Variables

  • Voot Tangkaratt
  • Herke van Hoof
  • Simone Parisi
  • Gerhard Neumann
  • Jan Peters
  • Masashi Sugiyama

Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.

ICAPS Conference 2017 Conference Paper

State-Regularized Policy Search for Linearized Dynamical Systems

  • Hany Abdulsamad
  • Oleg Arenz
  • Jan Peters 0001
  • Gerhard Neumann

Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedback-controllers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policies can still differ significantly, rendering the used local approximate models invalid. To alleviate this issue we propose enforcing a relative entropy constraint not only on the policy update, but also on the update of the state distribution, around which the dynamics and cost are being approximated. We present a derivation of the closed-form policy update and show that our approach outperforms related methods on two nonlinear and highly dynamic simulated systems.

AAAI Conference 2017 Conference Paper

The Kernel Kalman Rule Ñ Efficient Nonparametric Inference with Recursive Least Squares

  • Gregor Gebhardt
  • Andras Kupcsik
  • Gerhard Neumann

Nonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems. Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes’ rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR. The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update. We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF). We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.

NeurIPS Conference 2016 Conference Paper

Catching heuristics are optimal control policies

  • Boris Belousov
  • Gerhard Neumann
  • Constantin Rothkopf
  • Jan Peters

Two seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher's policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control.

JMLR Journal 2016 Journal Article

Hierarchical Relative Entropy Policy Search

  • Christian Daniel
  • Gerhard Neumann
  • Oliver Kroemer
  • Jan Peters

Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

ICRA Conference 2016 Conference Paper

Learning soft task priorities for control of redundant robots

  • Valerio Modugno
  • Gerhard Neumann
  • Elmar Rueckert
  • Giuseppe Oriolo
  • Jan Peters 0001
  • Serena Ivaldi

One of the key problems in planning and control of redundant robots is the fast generation of controls when multiple tasks and constraints need to be satisfied. In the literature, this problem is classically solved by multi-task prioritized approaches, where the priority of each task is determined by a weight function, describing the task strict/soft priority. In this paper, we propose to leverage machine learning techniques to learn the temporal profiles of the task priorities, represented as parametrized weight functions: we automatically determine their parameters through a stochastic optimization procedure. We show the effectiveness of the proposed method on a simulated 7 DOF Kuka LWR and both a simulated and a real Kinova Jaco arm. We compare the performance of our approach to a state-of-the-art method based on soft task prioritization, where the task weights are typically hand-tuned.

AAAI Conference 2016 Conference Paper

Model-Free Preference-Based Reinforcement Learning

  • Christian Wirth
  • Johannes Fürnkranz
  • Gerhard Neumann

Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action spaces require a known or estimated model, which is often not available and hard to learn. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. Our new algorithm is based on Relative Entropy Policy Search (REPS), enabling us to utilize stochastic policies and to directly control the greediness of the policy update. REPS decreases exploration of the policy slowly by limiting the relative entropy of the policy update, which ensures that the algorithm is provided with a versatile set of trajectories, and consequently with informative preferences. The preferencebased estimation is computed using a sample-based Bayesian method, which can also estimate the uncertainty of the utility. Additionally, we also compare to a linear solvable approximation, based on inverse RL. We show that both approaches perform favourably to the current state-of-the-art. The overall result is an algorithm that can learn non-parametric continuous action policies from a small number of preferences.

ICML Conference 2016 Conference Paper

Model-Free Trajectory Optimization for Reinforcement Learning

  • Riad Akrour
  • Gerhard Neumann
  • Hany Abdulsamad
  • Abbas Abdolmaleki

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

ICRA Conference 2016 Conference Paper

Movement primitives with multiple phase parameters

  • Marco Ewerton
  • Guilherme Maeda
  • Gerhard Neumann
  • Viktor Kisner
  • Gerrit Kollegger
  • Josef Wiemeyer
  • Jan Peters 0001

Movement primitives are concise movement representations that can be learned from human demonstrations, support generalization to novel situations and modulate the speed of execution of movements. The speed modulation mechanisms proposed so far are limited though, allowing only for uniform speed modulation or coupling changes in speed to local measurements of forces, torques or other quantities. Those approaches are not enough when dealing with general velocity constraints. We present a movement primitive formulation that can be used to non-uniformly adapt the speed of execution of a movement in order to satisfy a given constraint, while maintaining similarity in shape to the original trajectory. We present results using a 4-DoF robot arm in a minigolf setup.

IROS Conference 2016 Conference Paper

Non-parametric contextual stochastic search

  • Abbas Abdolmaleki
  • Nuno Lau
  • Luís Paulo Reis
  • Gerhard Neumann

Stochastic search algorithms are black-box optimizer of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. Yet, many stochastic search algorithms require relearning if the task or objective function changes slightly to adapt the solution to the new situation or the new context. In this paper, we consider the contextual stochastic search setup. Here, we want to find multiple good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation of a task or context. Contextual algorithms have been investigated in the field of policy search, however, the search distribution typically uses a parametric model that is linear in the some hand-defined context features. Finding good context features is a challenging task, and hence, non-parametric methods are often preferred over their parametric counter-parts. In this paper, we propose a non-parametric contextual stochastic search algorithm that can learn a non-parametric search distribution for multiple tasks simultaneously. In difference to existing methods, our method can also learn a context dependent covariance matrix that guides the exploration of the search process. We illustrate its performance on several non-linear contextual tasks.

IROS Conference 2016 Conference Paper

Optimal control and inverse optimal control by distribution matching

  • Oleg Arenz
  • Hany Abdulsamad
  • Gerhard Neumann

Optimal control is a powerful approach to achieve optimal behavior. However, it typically requires a manual specification of a cost function which often contains several objectives, such as reaching goal positions at different time steps or energy efficiency. Manually trading-off these objectives is often difficult and requires a high engineering effort. In this paper, we present a new approach to specify optimal behavior. We directly specify the desired behavior by a distribution over future states or features of the states. For example, the experimenter could choose to reach certain mean positions with given accuracy/variance at specified time steps. Our approach also unifies optimal control and inverse optimal control in one framework. Given a desired state distribution, we estimate a cost function such that the optimal controller matches the desired distribution. If the desired distribution is estimated from expert demonstrations, our approach performs inverse optimal control. We evaluate our approach on several optimal and inverse optimal control tasks on non-linear systems using incremental linearizations similar to differential dynamic programming approaches.

RLDM Conference 2015 Conference Abstract

(Non-Parametric) Bayesian Linear Value Function Approximation

  • Andras Kupcsik
  • Gerhard Neumann

Least Squares Temporal Difference (LSTD) is a popular approach to evaluate value functions for a given policy. The goal of LSTD and other related methods is to find a linear approximation of the (action-)value function in feature space that is consistent with the Bellman equation. While standard tem- poral difference learning accounts for stochasticity in dynamics and in the policy, the (action-)values are mostly considered as deterministic variables. However, in many scenarios, we also want to obtain variances of the values, where the variance might depend on the stochasticity of the model, the stochasticity of the reward function and the uncertainty of the parameters of the value function due to a limited set of sampled transitions. In this paper, we are proposing a novel Bayesian approach to approximate action-value func- tions for policy evaluation in Reinforcement Learning (RL) tasks. First, we show how projectors can be used to estimate the conditional expectation in the Bellman equation using sample features, and how we can find the regularized optimal solution to minimize the mean squared Bellman error. Subsequently, we turn to the Bayesian treatment of this approach and show that the solution exists in closed form. We also present a kernelized version of our approach, the Value Function Process, leveraging the idea of Gaussian Process regression. Both approaches can be used to obtain the variances of the values by marginalizing the parameters of the value. We show that existing LSTD variants are a special case of our new formulation and present preliminary simulation results in a state chain navigation task that show the superior performance of our approach.

ICRA Conference 2015 Conference Paper

Extracting low-dimensional control variables for movement primitives

  • Elmar Rueckert
  • Jan Mundo
  • Alexandros Paraschos
  • Jan Peters 0001
  • Gerhard Neumann

Movement primitives (MPs) provide a powerful framework for data driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. In robotics we often want to solve a multitude of different, but related tasks. As the parameters of the primitives are typically high dimensional, a common practice for the generalization of movement primitives to new tasks is to adapt only a small set of control variables, also called meta parameters, of the primitive. Yet, for most MP representations, the encoding of these control variables is pre-coded in the representation and can not be adapted to the considered tasks. In this paper, we want to learn the encoding of task-specific control variables also from data instead of relying on fixed meta-parameter representations. We use hierarchical Bayesian models (HBMs) to estimate a low dimensional latent variable model for probabilistic movement primitives (ProMPs), which is a recent movement primitive representation. We show on two real robot datasets that ProMPs based on HBMs outperform standard ProMPs in terms of generalization and learning from a small amount of data and also allows for an intuitive analysis of the movement. We also extend our HBM by a mixture model, such that we can model different movement types in the same dataset.

IROS Conference 2015 Conference Paper

Learning motor skills from partially observed movements executed at different speeds

  • Marco Ewerton
  • Guilherme Maeda
  • Jan Peters 0001
  • Gerhard Neumann

Learning motor skills from multiple demonstrations presents a number of challenges. One of those challenges is the occurrence of occlusions and lack of sensor coverage, which may corrupt part of the recorded data. Another issue is the variability in speed of execution of the demonstrations, which may require a way of finding the correspondence between the time steps of the different demonstrations. In this paper, an approach to learn motor skills is proposed that accounts both for spatial and temporal variability of movements. This approach, based on an Expectation-Maximization algorithm to learn Probabilistic Movement Primitives, also allows for learning motor skills from partially observed demonstrations, which may result from occlusion or lack of sensor coverage. An application of the algorithm proposed in this work lies in the field of Human-Robot Interaction when the robot has to react to human movements executed at different speeds. Experiments in which a robotic arm receives a cup handed over by a human illustrate this application. The capabilities of the algorithm in learning and predicting movements are also evaluated in experiments using a data set of letters and a data set of golf putting movements.

ICRA Conference 2015 Conference Paper

Learning multiple collaborative tasks with a mixture of Interaction Primitives

  • Marco Ewerton
  • Gerhard Neumann
  • Rudolf Lioutikov
  • Heni Ben Amor
  • Jan Peters 0001
  • Guilherme Maeda

Robots that interact with humans must learn to not only adapt to different human partners but also to new interactions. Such a form of learning can be achieved by demonstrations and imitation. A recently introduced method to learn interactions from demonstrations is the framework of Interaction Primitives. While this framework is limited to represent and generalize a single interaction pattern, in practice, interactions between a human and a robot can consist of many different patterns. To overcome this limitation this paper proposes a Mixture of Interaction Primitives to learn multiple interaction patterns from unlabeled demonstrations. Specifically the proposed method uses Gaussian Mixture Models of Interaction Primitives to model nonlinear correlations between the movements of the different agents. We validate our algorithm with two experiments involving interactive tasks between a human and a lightweight robotic arm. In the first, we compare our proposed method with conventional Interaction Primitives in a toy problem scenario where the robot and the human are not linearly correlated. In the second, we present a proof-of-concept experiment where the robot assists a human in assembling a box.

NeurIPS Conference 2015 Conference Paper

Model-Based Relative Entropy Stochastic Search

  • Abbas Abdolmaleki
  • Rudolf Lioutikov
  • Jan Peters
  • Nuno Lau
  • Luis Pualo Reis
  • Gerhard Neumann

Stochastic search algorithms are general black-box optimizers. Due to their ease of use and their generality, they have recently also gained a lot of attention in operations research, machine learning and policy search. Yet, these algorithms require a lot of evaluations of the objective, scale poorly with the problem dimension, are affected by highly noisy objective functions and may converge prematurely. To alleviate these problems, we introduce a new surrogate-based stochastic search approach. We learn simple, quadratic surrogate models of the objective function. As the quality of such a quadratic approximation is limited, we do not greedily exploit the learned models. The algorithm can be misled by an inaccurate optimum introduced by the surrogate. Instead, we use information theoretic constraints to bound the `distance' between the new and old data distribution while maximizing the objective function. Additionally the new method is able to sustain the exploration of the search distribution to avoid premature convergence. We compare our method with state of art black-box optimization methods on standard uni-modal and multi-modal optimization functions, on simulated planar robot tasks and a complex robot ball throwing task. The proposed method considerably outperforms the existing approaches.

EWRL Workshop 2015 Workshop Paper

Model-Free Preference-based Reinforcement Learning

  • Christian Wirth
  • Gerhard Neumann

Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from an human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches for PBRL with non-parametric policies require a known or estimated model. In this paper, we integrate preference-based estimation of the reward function into a model-free reinforcement learning (RL) algorithm, resulting in a model-free PBRL algorithm. In a model-free setup, we need to use a random exploration strategy which is implemented by a stochastic policy. We show that, by controlling the greediness of the policy update, a comparable performance to commonly used directed exploration strategies, which require a model, can be achieved. Our new algorithm is based on the Relative Entropy Policy Search algorithm and can learn non-parametric continuous action policies from a small number of preferences.

IROS Conference 2015 Conference Paper

Model-free Probabilistic Movement Primitives for physical interaction

  • Alexandros Paraschos
  • Elmar Rueckert
  • Jan Peters 0001
  • Gerhard Neumann

Physical interaction in robotics is a complex problem that requires not only accurate reproduction of the kinematic trajectories but also of the forces and torques exhibited during the movement. We base our approach on Movement Primitives (MP), as MPs provide a framework for modelling complex movements and introduce useful operations on the movements, such as generalization to novel situations, time scaling, and others. Usually, MPs are trained with imitation learning, where an expert demonstrates the trajectories. However, MPs used in physical interaction either require additional learning approaches, e. g. , reinforcement learning, or are based on handcrafted solutions. Our goal is to learn and generate movements for physical interaction that are learned with imitation learning, from a small set of demonstrated trajectories. The Probabilistic Movement Primitives (ProMPs) framework is a recent MP approach that introduces beneficial properties, such as combination and blending of MPs, and represents the correlations present in the movement. The ProMPs provides a variable stiffness controller that reproduces the movement but it requires a dynamics model of the system. Learning such a model is not a trivial task, and, therefore, we introduce the model-free ProMPs, that are learning jointly the movement and the necessary actions from a few demonstrations. We derive a variable stiffness controller analytically. We further extent the ProMPs to include force and torque signals, necessary for physical interaction. We evaluate our approach in simulated and real robot tasks.

EWRL Workshop 2015 Workshop Paper

Non-Parametric Policy Learning for High-Dimensional State Representations

  • Herke van Hoof
  • Jan Peters
  • Gerhard Neumann

Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Non-parametric methods can help to address this problem, but many current approaches rely on unstable greedy maximization. In this paper, we develop a kernel-based reinforcement learning algorithm that performs robust policy updates. We show that our method outperforms related approaches, and is able to learn an underpowered swing-up task task directly from high-dimensional image data.

ICAPS Conference 2015 Conference Paper

Policy Evaluation with Temporal Differences: A Survey and Comparison (Extended Abstract)

  • Christoph Dann
  • Gerhard Neumann
  • Jan Peters 0001

Value functions are an essential tool for solving sequential decision making problems such as Markov decision processes (MDPs). Computing the value function for a given policy (policy evaluation) is not only important for determining the quality of the policy but also a key step in prominent policy-iteration-type algorithms. In common settings where a model of the Markov decision process is not available or too complex to handle directly, an approximation of the value function is usually estimated from samples of the process. Linearly parameterized estimates are often preferred due to their simplicity and strong stability guarantees. Since the late 1980s, research on policy evaluation in these scenarios has been dominated by temporal-difference (TD) methods because of their data-efficiency. However, several core issues have only been tackled recently, including stability guarantees for off-policy estimation where the samples are not generated by the policy to evaluate. Together with improving sample efficiency and probabilistic treatment of uncertainty in the value estimates, these efforts have lead to numerous new temporal-difference algorithms. These methods are scattered over the literature and usually only compared to most similar approaches. The article therefore aims at presenting the state of the art of policy evaluation with temporal differences and linearly parameterized value functions in discounted MDPs as well as a more comprehensive comparison of these approaches. We put the algorithms in a unified framework of function optimization, with focus on surrogate cost functions and optimization strategies, to identify similarities and differences between the methods. In addition, important extensions of the base methods such as off-policy estimation and eligibility traces for better bias-variance trade-off, as well as regularization in high dimensional feature spaces, are discussed.

ICRA Conference 2015 Conference Paper

Towards learning hierarchical skills for multi-phase manipulation tasks

  • Oliver Kroemer
  • Christian G. Daniel
  • Gerhard Neumann
  • Herke van Hoof
  • Jan Peters 0001

Most manipulation tasks can be decomposed into a sequence of phases, where the robot's actions have different effects in each phase. The robot can perform actions to transition between phases and, thus, alter the effects of its actions, e. g. grasp an object in order to then lift it. The robot can thus reach a phase that affords the desired manipulation. In this paper, we present an approach for exploiting the phase structure of tasks in order to learn manipulation skills more efficiently. Starting with human demonstrations, the robot learns a probabilistic model of the phases and the phase transitions. The robot then employs model-based reinforcement learning to create a library of motor primitives for transitioning between phases. The learned motor primitives generalize to new situations and tasks. Given this library, the robot uses a value function approach to learn a high-level policy for sequencing the motor primitives. The proposed method was successfully evaluated on a real robot performing a bimanual grasping task.

ICRA Conference 2014 Conference Paper

Interaction primitives for human-robot cooperation tasks

  • Heni Ben Amor
  • Gerhard Neumann
  • Sanket Kamthe
  • Oliver Kroemer
  • Jan Peters 0001

To engage in cooperative activities with human partners, robots have to possess basic interactive abilities and skills. However, programming such interactive skills is a challenging task, as each interaction partner can have different timing or an alternative way of executing movements. In this paper, we propose to learn interaction skills by observing how two humans engage in a similar task. To this end, we introduce a new representation called Interaction Primitives. Interaction primitives build on the framework of dynamic motor primitives (DMPs) by maintaining a distribution over the parameters of the DMP. With this distribution, we can learn the inherent correlations of cooperative activities which allow us to infer the behavior of the partner and to participate in the cooperation. We will provide algorithms for synchronizing and adapting the behavior of humans and robots during joint physical activities.

IROS Conference 2014 Conference Paper

Latent space policy search for robotics

  • Kevin Sebastian Luck
  • Gerhard Neumann
  • Erik Berger
  • Jan Peters 0001
  • Heni Ben Amor

Learning motor skills for robots is a hard task. In particular, a high number of degrees-of-freedom in the robot can pose serious challenges to existing reinforcement learning methods, since it leads to a high-dimensional search space. However, complex robots are often intrinsically redundant systems and, therefore, can be controlled using a latent manifold of much smaller dimensionality. In this paper, we present a novel policy search method that performs efficient reinforcement learning by uncovering the low-dimensional latent space of actuator redundancies. In contrast to previous attempts at combining reinforcement learning and dimensionality reduction, our approach does not perform dimensionality reduction as a preprocessing step but naturally combines it with policy search. Our evaluations show that the new approach outperforms existing algorithms for learning motor skills with high-dimensional robots.

ICRA Conference 2014 Conference Paper

Learning to predict phases of manipulation tasks as hidden states

  • Oliver Kroemer
  • Herke van Hoof
  • Gerhard Neumann
  • Jan Peters 0001

Phase transitions in manipulation tasks often occur when contacts between objects are made or broken. A switch of the phase can result in the robot's actions suddenly influencing different aspects of its environment. Therefore, the boundaries between phases often correspond to constraints or subgoals of the manipulation task. In this paper, we investigate how the phases of manipulation tasks can be learned from data. The task is modeled as an autoregressive hidden Markov model, wherein the hidden phase transitions depend on the observed states. The model is learned from data using the expectation-maximization algorithm. We demonstrate the proposed method on both a pushing task and a pepper mill turning task. The proposed approach was compared to a standard autoregressive hidden Markov model. The experiments show that the learned models can accurately predict the transitions in phases during the manipulation tasks.

JMLR Journal 2014 Journal Article

Policy Evaluation with Temporal Differences: A Survey and Comparison

  • Christoph Dann
  • Gerhard Neumann
  • Jan Peters

Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches. This paper aims at making these new developments accessible in a concise overview, with foci on underlying cost functions, the off-policy scenario as well as on regularization in high dimensional feature spaces. By presenting the first extensive, systematic comparative evaluations comparing TD, LSTD, LSPE, FPKF, the residual- gradient algorithm, Bellman residual minimization, GTD, GTD2 and TDC, we shed light on the strengths and weaknesses of the methods. Moreover, we present alternative versions of LSTD and LSPE with drastically improved off-policy performance. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

ICRA Conference 2014 Conference Paper

Sample-based informationl-theoretic stochastic optimal control

  • Rudolf Lioutikov
  • Alexandros Paraschos
  • Jan Peters 0001
  • Gerhard Neumann

Many Stochastic Optimal Control (SOC) approaches rely on samples to either obtain an estimate of the value function or a linearisation of the underlying system model. However, these approaches typically neglect the fact that the accuracy of the policy update depends on the closeness of the resulting trajectory distribution to these samples. The greedy operator does not consider such closeness constraint to the samples. Hence, the greedy operator can lead to oscillations or even instabilities in the policy updates. Such undesired behaviour is likely to result in an inferior performance of the estimated policy. We reuse inspiration from the reinforcement learning community and relax the greedy operator used in SOC with an information theoretic bound that limits the `distance' of two subsequent trajectory distributions in a policy update. The introduced bound ensures a smooth and stable policy update. Our method is also well suited for model-based reinforcement learning, where we estimate the system dynamics model from data. As this model is likely to be inaccurate, it might be dangerous to exploit the model greedily. Instead, our bound ensures that we generate new data in the vicinity of the current data, such that we can improve our estimate of the system dynamics model. We show that our approach outperforms several state of the art approaches on challenging simulated robot control tasks.

AAAI Conference 2013 Conference Paper

Data-Efficient Generalization of Robot Skills with Contextual Policy Search

  • Andras Kupcsik
  • Marc Deisenroth
  • Jan Peters
  • Gerhard Neumann

In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level policies are based on model-free policy search, which require an excessive number of interactions of the robot with its environment. More data-efficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new model-based policy search approach that can also learn contextual upper-level policies. Our approach is based on learning probabilistic forward models for long-term predictions. Using these predictions, we use information-theoretic insights to improve the upper-level policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.

EWRL Workshop 2013 Workshop Paper

Hierarchical Learning of Motor Skills with Information-Theoretic Policy Search

  • Gerhard Neumann
  • Christian Daniel
  • Andras Kupsic
  • Marc Deisenroth
  • Jan Peters

The key idea behind information-theoretic policy search is to bound the ‘distance’ between the new and old trajectory distribution, where the relative entropy is used as ‘distance measure’. The relative entropy bound exhibits many beneficial properties, such as a smooth and fast learning process and a closed-form solution for the resulting policy. In this paper we will summarize our work on information theoretic policy search for motor skill learning where we put particular focus on extending the original algorithm to learn several options for a motor task, select an option for the current situation, adapt the option to the situation and sequence options to solve an overall task. Finally, we illustrate the performance of our algorithm with experiments on real robots.

ICRA Conference 2013 Conference Paper

Learning sequential motor tasks

  • Christian G. Daniel
  • Gerhard Neumann
  • Oliver Kroemer
  • Jan Peters 0001

Many real robot applications require the sequential use of multiple distinct motor primitives. This requirement implies the need to learn the individual primitives as well as a strategy to select the primitives sequentially. Such hierarchical learning problems are commonly either treated as one complex monolithic problem which is hard to learn, or as separate tasks learned in isolation. However, there exists a strong link between the robots strategy and its motor primitives. Consequently, a consistent framework is needed that can learn jointly on the level of the individual primitives and the robots strategy. We present a hierarchical learning method which improves individual motor primitives and, simultaneously, learns how to combine these motor primitives sequentially to solve complex motor tasks. We evaluate our method on the game of robot hockey, which is both difficult to learn in terms of the required motor primitives as well as its strategic elements.

NeurIPS Conference 2013 Conference Paper

Probabilistic Movement Primitives

  • Alexandros Paraschos
  • Christian Daniel
  • Jan Peters
  • Gerhard Neumann

Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

IROS Conference 2012 Conference Paper

Generalization of human grasping for multi-fingered robot hands

  • Heni Ben Amor
  • Oliver Kroemer
  • Ulrich Hillenbrand
  • Gerhard Neumann
  • Jan Peters 0001

Multi-fingered robot grasping is a challenging problem that is difficult to tackle using hand-coded programs. In this paper we present an imitation learning approach for learning and generalizing grasping skills based on human demonstrations. To this end, we split the task of synthesizing a grasping motion into three parts: (1) learning efficient grasp representations from human demonstrations, (2) warping contact points onto new objects, and (3) optimizing and executing the reach-and-grasp movements. We learn low-dimensional latent grasp spaces for different grasp types, which form the basis for a novel extension to dynamic motor primitives. These latent-space dynamic motor primitives are used to synthesize entire reach-and-grasp movements. We evaluated our method on a real humanoid robot. The results of the experiment demonstrate the robustness and versatility of our approach.

IROS Conference 2012 Conference Paper

Learning concurrent motor skills in versatile solution spaces

  • Christian G. Daniel
  • Gerhard Neumann
  • Jan Peters 0001

Future robots need to autonomously acquire motor skills in order to reduce their reliance on human programming. Many motor skill learning methods concentrate on learning a single solution for a given task. However, discarding information about additional solutions during learning unnecessarily limits autonomy. Such favoring of single solutions often requires re-learning of motor skills when the task, the environment or the robot's body changes in a way that renders the learned solution infeasible. Future robots need to be able to adapt to such changes and, ideally, have a large repertoire of movements to cope with such problems. In contrast to current methods, our approach simultaneously learns multiple distinct solutions for the same task, such that a partial degeneration of this solution space does not prevent the successful completion of the task. In this paper, we present a complete framework that is capable of learning different solution strategies for a real robot Tetherball task.

NeurIPS Conference 2008 Conference Paper

Fitted Q-iteration by Advantage Weighted Regression

  • Gerhard Neumann
  • Jan Peters

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e. g. , in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.