Arrow Research search
Back to ICML

ICML 2025

Return Capping: Sample Efficient CVaR Policy Gradient Optimisation

Conference Paper Accept (poster) Artificial Intelligence ยท Machine Learning

Abstract

When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines. We have made all our code available here: https: //github. com/HarryMJMead/cvar-return-capping.

Authors

Keywords

  • Reinforcement Learning
  • Machine Learning
  • CVaR
  • Risk-Averse

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
557601764037608451