Optimistic Planning by Regularized Dynamic Programming

Antoine Moulin; Gergely Neu

Back to EWRL

EWRL 2023

Optimistic Planning by Regularized Dynamic Programming

Workshop Paper EWRL16 Artificial Intelligence · Machine Learning · Reinforcement Learning

PDF

Abstract

We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.

Authors

Keywords

Approximate Dynamic Programming
Discounted Markov Decision Processes
Online Mirror Descent
Optimistic Planning

Context

Venue: European Workshop on Reinforcement Learning
Archive span: 2008-2025
Indexed papers: 649
Paper id: 655823271142178605