André Barreto Papers

RLDM Conference 2013 Conference Abstract

CAPI: Generalized Classification-based Approximate Policy Iteration

Amir-massoud Farahmand
Doina Precup
André Barreto
Mohammad Ghavamzadeh

Efficient methods for tackling large reinforcement learning problems usually exploit regularities, or intrinsic structures, of the problem in hand. Most current methods benefit from the regularities of either value function or policy, but not both. In this paper, we introduce a general classification-based approximate policy iteration (CAPI) framework, which can benefit from both types of regularities. This framework has two main components: a generic user- specified value function estimator and a weighted classifier that learns a policy based on the estimated value function. The result is a flexible and sample-efficient class of algorithms. We also use a particular instantiation of CAPI to design an adaptive treatment strategy for HIV-infected patients. Comparison with a state-of-the-art purely value-based reinforcement learning algorithm, Tree- based Fitted Q-Iteration, shows that benefitting from the regularity of both policy and value function can lead to better performance.

PDF Details

Possible papers

CAPI: Generalized Classification-based Approximate Policy Iteration