Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

Kei Ota; Devesh K. Jha; Tomoaki Oiki; Mamoru Miura; Takashi Nammoto; Daniel Nikovski; Toshisada Mariyama

Back to IROS

IROS 2019

Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

In this paper, we propose a reinforcement learning-based algorithm for trajectory optimization for constrained dynamical systems. This problem is motivated by the fact that for most robotic systems, the dynamics may not always be known. Generating smooth, dynamically feasible trajectories could be difficult for such systems. Using sampling-based algorithms for motion planning may result in trajectories that are prone to undesirable control jumps. However, they can usually provide a good reference trajectory which a model-free reinforcement learning algorithm can then exploit by limiting the search domain and quickly finding a dynamically smooth trajectory. We use this idea to train a reinforcement learning agent to learn a dynamically smooth trajectory in a curriculum learning setting. Furthermore, for generalization, we parameterize the policies with goal locations, so that the agent can be trained for multiple goals simultaneously. We show result in both simulated environments as well as real experiments, for a 6-DoF manipulator arm operated in position-controlled mode to validate the proposed idea. We compare the proposed ideas against a PID controller which is used to track a designed trajectory in configuration space. Our experiments show that our RL agent trained with a reference path outperformed a model-free PID controller of the type commonly used on many robotic platforms for trajectory tracking.

Authors

Keywords

Target tracking
Smoothing methods
Limiting
Trajectory tracking
Heuristic algorithms
Reinforcement learning
Planning
Trajectory optimization
Manipulator dynamics
Intelligent robots
Dynamical
Simulation Environment
Path Planning
Proportional-integral-derivative
Configuration Space
Reference Trajectory
Curriculum Learning
Reinforcement Learning Agent
Model-free Reinforcement Learning
Reference Path
Past Experiences
Angular Velocity
Baseline Methods
Joint Angles
Presence Of States
Actor Network
Goal State
Target State
Reward Function
State Constraints
Presence Of Constraints
Critic Network
Replay Buffer
Q-function
Planning Algorithm
Presence Of Obstacles
Trajectory Tracking Control
Goal Position
Control Constraints

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 677524027083204936