Arrow Research search
Back to TMLR

TMLR 2025

Meta-learning Population-based Methods for Reinforcement Learning

Journal Article Articles Artificial Intelligence · Machine Learning

Abstract

Reinforcement learning (RL) algorithms are highly sensitive to their hyperparameter settings. Recently, numerous methods have been proposed to dynamically optimize these hyperparameters. One prominent approach is Population-Based Bandits (PB2), which uses time-varying Gaussian processes (GP) to dynamically optimize hyperparameters with a population of parallel agents. Despite its strong overall performance, PB2 experiences slow starts due to the GP initially lacking sufficient information. To mitigate this issue, we propose four different methods that utilize meta-data from various environments. These approaches are novel in that they adapt meta-learning methods to accommodate the time-varying setting. Among these approaches, MultiTaskPB2, which uses meta-learning for the surrogate model, stands out as the most promising approach. It outperforms PB2 and other baselines in both anytime and final performance across two RL environment families.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Transactions on Machine Learning Research
Archive span
2022-2026
Indexed papers
3849
Paper id
1008536083017199303