An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Guanyu Nie; Mridul Agarwal; Abhishek Kumar Umrawal; Vaneet Aggarwal; Christopher John Quinn

Back to UAI

UAI 2022

An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning · Uncertainty in Artificial Intelligence

Details

Abstract

We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.

Authors

Keywords

Multi-armed bandit
submodular
bandit feedback

Context

Venue: Conference on Uncertainty in Artificial Intelligence
Archive span: 1985-2025
Indexed papers: 3717
Paper id: 995618631080976756