Accelerated Gradient Temporal Difference Learning

Yangchen Pan; Adam White; Martha White

Back to AAAI

AAAI 2017

Accelerated Gradient Temporal Difference Learning

Conference Paper Machine Learning Methods Artificial Intelligence

PDF Details

Abstract

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efﬁcient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efﬁciency beneﬁts to least-squares methods, at a fraction of the computation and storage (2) signiﬁcantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.

Accelerated Gradient Temporal Difference Learning

Abstract

Authors

Keywords

Context