Arrow Research search
Back to AAAI

AAAI 2017

Sequential Classification-Based Optimization for Direct Policy Search

Conference Paper Machine Learning Methods Artificial Intelligence

Abstract

Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classificationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efficient for non-convex optimization problems with many local optima, and may provide a power optimization tool for direct policy search. However, this framework requires to sample a batch of solutions for every update of the search model, while in reinforcement learning, the environment often offers only sequential policy evaluation. Thus the classification-based optimization may not efficient for direct policy search, where solutions have to be sampled sequentially. In this paper, we adapt the classification-based optimization for sequential sampled solutions by forming the sample batch via reusing historical solutions. Experiments on a helicopter hovering task and controlling tasks in OpenAI Gym show that the new algorithm significantly improve the performance from several state-of-the-art derivative-free optimization approaches.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
485982661659533495