JMLR 2009
Online Learning with Sample Path Constraints
Abstract
We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem. [abs] [ pdf ][ bib ] © JMLR 2009. ( edit, beta )
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Journal of Machine Learning Research
- Archive span
- 2000-2026
- Indexed papers
- 4180
- Paper id
- 251902190864419540