Multiobjective Lipschitz Bandits under Lexicographic Ordering

Bo Xue; Ji Cheng; Fei Liu; Yimu Wang; Qingfu Zhang

doi:10.1609/aaai.v38i15.29558

Back to AAAI

AAAI 2024

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Conference Paper AAAI Technical Track on Machine Learning VI Artificial Intelligence

PDF Details DOI

Abstract

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize? objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is O((KT)^(2/3)) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Omega(KlogT). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of O(T^((d_z^i+1)/(d_z^i+2))) for the i-th objective, where d_z^i is the zooming dimension for the i-th objective, with i in {1,2,...,m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

Multiobjective Lipschitz Bandits under Lexicographic Ordering

Abstract

Authors

Keywords

Context