Continual Optimistic Initialization for Value-Based Reinforcement Learning

Sheelabhadra Dey; James Ault; Guni Sharon

Back to AAMAS

AAMAS 2024

Continual Optimistic Initialization for Value-Based Reinforcement Learning

Conference Paper Full Research Papers Autonomous Agents and Multiagent Systems

PDF

Abstract

Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore “infinitely-often”. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.

Continual Optimistic Initialization for Value-Based Reinforcement Learning

Abstract

Authors

Keywords

Context