ICLR Conference 2021 Conference Paper
Balancing Constraints and Rewards with Meta-Gradient D4PG
- Dan A. Calian
- Daniel J. Mankowitz
- Tom Zahavy
- Zhongwen Xu
- Junhyuk Oh
- Nir Levine
- Timothy A. Mann
Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present two soft-constrained RL approaches that utilize meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of these approaches by showing that they consistently outperform the baselines across four different Mujoco domains.