Arrow Research search
Back to AAMAS

AAMAS 2024

Continual Optimistic Initialization for Value-Based Reinforcement Learning

Conference Paper Full Research Papers Autonomous Agents and Multiagent Systems

Abstract

Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore “infinitely-often”. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.

Authors

Keywords

  • Reinforcement Learning
  • Exploration Strategies
  • Optimistic Initialization

Context

Venue
International Conference on Autonomous Agents and Multiagent Systems
Archive span
2002-2025
Indexed papers
7403
Paper id
821609950179110019