AAMAS Conference 2025 Conference Paper
Regret Guarantees for a UCB-based Algorithm for Volatile Combinatorial Bandits
- Abhishek Kumar
- Andra Siva Sai Teja
- Ganesh Ghalme
- Sujit Gujar
- Y. Narahari
We study the combinatorial multi-armed bandit (MAB) problem with an additional constraint that an arbitrary subset of arms is unavailable at any given time instant. We refer to this setting as a volatile combinatorial MAB setting. The bandit algorithm must pull a subset of arms from the set of available arms to minimize the regret. Under some mild smoothness conditions, we show that the proposed CV-UCB algorithm—a straightforward extension of well-known C-UCB algorithm—achieves a 𝑂(log(𝑇)) instancedependent regret guarantee under a semi-bandit feedback setting. We further show that under some mild restrictions on the range of reward functions, CV-UCB incurs 𝑂( √︁ 𝑇 log(𝑇)) regret, which we call weak instance-independent regret. We further show that the instance-independent regret of 𝑂( 3 √︁ 𝑇2 log(𝑇)) for CV-UCB algorithm, completing the hierarchy of regret guarantees obtained by gradually relaxing the dependence on the instance parameters.