AAAI 2017
Sequential Classification-Based Optimization for Direct Policy Search
Abstract
Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classificationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efficient for non-convex optimization problems with many local optima, and may provide a power optimization tool for direct policy search. However, this framework requires to sample a batch of solutions for every update of the search model, while in reinforcement learning, the environment often offers only sequential policy evaluation. Thus the classification-based optimization may not efficient for direct policy search, where solutions have to be sampled sequentially. In this paper, we adapt the classification-based optimization for sequential sampled solutions by forming the sample batch via reusing historical solutions. Experiments on a helicopter hovering task and controlling tasks in OpenAI Gym show that the new algorithm significantly improve the performance from several state-of-the-art derivative-free optimization approaches.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 485982661659533495