Arrow Research search
Back to RLC

RLC 2025

Adaptive Submodular Policy Optimization

Conference Paper RLC accepted paper Artificial Intelligence · Machine Learning · Reinforcement Learning

Abstract

We propose KL-regularized policy optimization for adaptive submodular maximization, which is a framework for decision making under uncertainty with submodular rewards. Policy optimization of adaptive submodular functions justifies a surprisingly simple and efficient policy gradient update, where the optimized action only affects its immediate reward but not the future ones. It also allows us to learn adaptive submodular policies with large action spaces, such as those represented by large language models (LLMs). We prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Reinforcement Learning Conference
Archive span
2024-2025
Indexed papers
228
Paper id
840691398057830870