Quantile Credit Assignment

Thomas Mesnard; Wenqi Chen; Alaa Saade; Yunhao Tang; Mark Rowland 0001; Theophane Weber; Clare Lyle; Audrunas Gruslys; Michal Valko; Will Dabney; Georg Ostrovski; Eric Moulines; Rémi Munos

Back to ICML

ICML 2023

Quantile Credit Assignment

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning

Details

Abstract

In reinforcement learning, the credit assignment problem is to distinguish luck from skill, that is, separate the inherent randomness in the environment from the controllable effects of the agent’s actions. This paper proposes two novel algorithms, Quantile Credit Assignment (QCA) and Hindsight QCA (HQCA), which incorporate distributional value estimation to perform credit assignment. QCA uses a network that predicts the quantiles of the return distribution, whereas HQCA additionally incorporates information about the future. Both QCA and HQCA have the appealing interpretation of leveraging an estimate of the quantile level of the return (interpreted as the level of "luck") in order to derive a "luck-dependent" baseline for policy gradient methods. We show theoretically that this approach gives an unbiased policy gradient estimate that can yield significant variance reductions over a standard value estimate baseline. QCA and HQCA significantly outperform prior state-of-the-art methods on a range of extremely difficult credit assignment problems.

Quantile Credit Assignment

Abstract

Authors

Keywords

Context