Meta-learning Population-based Methods for Reinforcement Learning

Johannes Hog; Raghu Rajan; André Biedenkapp; Noor Awad; Frank Hutter; Vu Nguyen

Back to TMLR

TMLR 2025

Meta-learning Population-based Methods for Reinforcement Learning

Journal Article Articles Artificial Intelligence · Machine Learning

PDF Details

Abstract

Reinforcement learning (RL) algorithms are highly sensitive to their hyperparameter settings. Recently, numerous methods have been proposed to dynamically optimize these hyperparameters. One prominent approach is Population-Based Bandits (PB2), which uses time-varying Gaussian processes (GP) to dynamically optimize hyperparameters with a population of parallel agents. Despite its strong overall performance, PB2 experiences slow starts due to the GP initially lacking sufficient information. To mitigate this issue, we propose four different methods that utilize meta-data from various environments. These approaches are novel in that they adapt meta-learning methods to accommodate the time-varying setting. Among these approaches, MultiTaskPB2, which uses meta-learning for the surrogate model, stands out as the most promising approach. It outperforms PB2 and other baselines in both anytime and final performance across two RL environment families.

Meta-learning Population-based Methods for Reinforcement Learning

Abstract

Authors

Keywords

Context