RLDM 2013
CAPI: Generalized Classification-based Approximate Policy Iteration
Abstract
Efficient methods for tackling large reinforcement learning problems usually exploit regularities, or intrinsic structures, of the problem in hand. Most current methods benefit from the regularities of either value function or policy, but not both. In this paper, we introduce a general classification-based approximate policy iteration (CAPI) framework, which can benefit from both types of regularities. This framework has two main components: a generic user- specified value function estimator and a weighted classifier that learns a policy based on the estimated value function. The result is a flexible and sample-efficient class of algorithms. We also use a particular instantiation of CAPI to design an adaptive treatment strategy for HIV-infected patients. Comparison with a state-of-the-art purely value-based reinforcement learning algorithm, Tree- based Fitted Q-Iteration, shows that benefitting from the regularity of both policy and value function can lead to better performance.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Multidisciplinary Conference on Reinforcement Learning and Decision Making
- Archive span
- 2013-2025
- Indexed papers
- 1004
- Paper id
- 903653690957081092