STOC 2017
Kernel-based methods for bandit convex optimization
Abstract
We consider the adversarial convex bandit problem and we build the first poly ( T )-time algorithm with poly ( n ) √ T -regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ( n 9.5 #8730; T )-regret, and we show that a simple variant of this algorithm can be run in poly ( n log( T ))-time per step at the cost of an additional poly ( n ) T o (1) factor in the regret. These results improve upon the Õ( n 11 #8730; T )-regret and exp( poly ( T ))-time result of the first two authors, and the log( T ) poly ( n ) #8730; T -regret and log( T ) poly ( n ) -time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ( n 1.5 #8730; T )-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω( n #8730; T ) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n 3 / ϵ 2 .
Authors
Keywords
Context
- Venue
- ACM Symposium on Theory of Computing
- Archive span
- 1969-2025
- Indexed papers
- 4364
- Paper id
- 107391574121304650