Arrow Research search
Back to STOC

STOC 2017

Kernel-based methods for bandit convex optimization

Conference Paper Session 1B Algorithms and Complexity · Theoretical Computer Science

Abstract

We consider the adversarial convex bandit problem and we build the first poly ( T )-time algorithm with poly ( n ) √ T -regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ( n 9.5 #8730; T )-regret, and we show that a simple variant of this algorithm can be run in poly ( n log( T ))-time per step at the cost of an additional poly ( n ) T o (1) factor in the regret. These results improve upon the Õ( n 11 #8730; T )-regret and exp( poly ( T ))-time result of the first two authors, and the log( T ) poly ( n ) #8730; T -regret and log( T ) poly ( n ) -time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ( n 1.5 #8730; T )-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω( n #8730; T ) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n 3 / ϵ 2 .

Authors

Keywords

  • convex optimization
  • multi-armed bandit
  • online learning

Context

Venue
ACM Symposium on Theory of Computing
Archive span
1969-2025
Indexed papers
4364
Paper id
107391574121304650