Deployment Efficient Reward-Free Exploration with Linear Function Approximation

Zihan Zhang; Yuxin Chen; Jason Lee; Simon Du; Lin Yang; Ruosong Wang

Back to NeurIPS

NeurIPS 2025

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

We study deployment-efficient reward-free exploration with linear function approximation, where the goal is to explore a linear Markov Decision Process (MDP) without revealing the reward function, while minimizing the number of distinct policies implemented during learning. By ``deployment efficient'', we mean algorithms that require few policies deployed during exploration -- crucial in real-world applications where such deployments are costly or disruptive. We design a novel reinforcement learning algorithm that achieves near-optimal deployment efficiency for linear MDPs in the reward-free setting, using at most $H$ exploration policies during execution (where $H$ is the horizon length), while maintaining sample complexity polynomial in feature dimension and horizon length. Unlike previous approaches with similar deployment efficiency guarantees, our algorithm's sample complexity is independent of the reachability or explorability coefficients of the underlying MDP, which can be arbitrarily small and lead to unbounded sample complexity in certain cases -- directly addressing an open problem from prior work. Our technical contributions include a data-dependent method for truncating state-action pairs in linear MDPs, efficient offline policy evaluation and optimization algorithms for these truncated MDPs, and a careful integration of these components to implement reward-free exploration with linear function approximation without sacrificing deployment efficiency.

Deployment Efficient Reward-Free Exploration with Linear Function Approximation

Abstract

Authors

Keywords

Context