Adaptive Submodular Policy Optimization

Branislav Kveton; Anup Rao; Viet Dac Lai; Nikos Vlassis; David Arbour

Back to RLC

RLC 2025

Adaptive Submodular Policy Optimization

Conference Paper RLC accepted paper Artificial Intelligence · Machine Learning · Reinforcement Learning

PDF Details

Abstract

We propose KL-regularized policy optimization for adaptive submodular maximization, which is a framework for decision making under uncertainty with submodular rewards. Policy optimization of adaptive submodular functions justifies a surprisingly simple and efficient policy gradient update, where the optimized action only affects its immediate reward but not the future ones. It also allows us to learn adaptive submodular policies with large action spaces, such as those represented by large language models (LLMs). We prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Reinforcement Learning Conference
Archive span: 2024-2025
Indexed papers: 228
Paper id: 840691398057830870