Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Haotian Xu; Shengjie Wang; Zhaolei Wang; Yunzhe Zhang; Qing Zhuo; Yang Gao 0029; Tao Zhang

Back to IROS

IROS 2023

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency constraints during the training, thus resulting in inefficient exploration in the early stage. In this paper, we propose an algorithm named Constrained Policy Optimization with Extra Safety Budget (ESB-CPO) to strike a balance between the exploration efficiency and the constraints satis-faction. In the early stage, our method loosens the practical constraints of unsafe transitions (adding extra safety bud-get) with the aid of a new metric we propose. With the training process, the constraints in our optimization problem become tighter. Meanwhile, theoretical analysis and practical experiments demonstrate that our method gradually meets the cost limit's demand in the final training stage. When evaluated on Safety-Gym and Bullet-Safety-Gym benchmarks, our method has shown its advantages over baseline algorithms in terms of safety and optimality. Remarkably, our method gains remarkable performance improvement under the same cost limit compared with baselines.

Authors

Keywords

Training
Measurement
Costs
Estimation
Reinforcement learning
Stability analysis
Safety
Optimal Policy
Constrained Optimization
Constrained Policy Optimization
Safety Budget
Optimization Problem
Robotic Tasks
Constraint Satisfaction
Consistency Constraint
Value Function
Lagrange Multiplier
Stable Values
Inequality Constraints
Lyapunov Function
Reward Function
Markov Decision Process
Sum Of Costs
Interior Point Method
Unconstrained Problem
Safety Policies
Practical Algorithm
Trust Region
Early Epoch
Trust Region Method
Safety Constraints
Policy Update

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 101994880313615462