Sequential Classification-Based Optimization for Direct Policy Search

Yi-Qi Hu; Hong Qian; Yang Yu

Back to AAAI

AAAI 2017

Sequential Classification-Based Optimization for Direct Policy Search

Conference Paper Machine Learning Methods Artificial Intelligence

PDF Details

Abstract

Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classiﬁcationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efﬁcient for non-convex optimization problems with many local optima, and may provide a power optimization tool for direct policy search. However, this framework requires to sample a batch of solutions for every update of the search model, while in reinforcement learning, the environment often offers only sequential policy evaluation. Thus the classiﬁcation-based optimization may not efﬁcient for direct policy search, where solutions have to be sampled sequentially. In this paper, we adapt the classiﬁcation-based optimization for sequential sampled solutions by forming the sample batch via reusing historical solutions. Experiments on a helicopter hovering task and controlling tasks in OpenAI Gym show that the new algorithm signiﬁcantly improve the performance from several state-of-the-art derivative-free optimization approaches.

Sequential Classification-Based Optimization for Direct Policy Search

Abstract

Authors

Keywords

Context