Author name cluster

Kenneth Stanley

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2018 Conference Paper

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Edoardo Conti
Vashisht Madhavan
Felipe Petroski Such
Joel Lehman
Kenneth Stanley
Jeff Clune

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e. g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i. e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES.

PDF Details

AAAI Conference 2015 Conference Paper

Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation

Paul Szerlip
Gregory Morse
Justin Pugh
Kenneth Stanley

Unlike unsupervised approaches such as autoencoders that learn to reconstruct their inputs, this paper introduces an alternative approach to unsupervised feature learning called divergent discriminative feature accumulation (DDFA) that instead continually accumulates features that make novel discriminations among the training set. Thus DDFA features are inherently discriminative from the start even though they are trained without knowledge of the ultimate classification problem. Interestingly, DDFA also continues to add new features indefinitely (so it does not depend on a hidden layer size), is not based on minimizing error, and is inherently divergent instead of convergent, thereby providing a unique direction of research for unsupervised feature learning. In this paper the quality of its learned features is demonstrated on the MNIST dataset, where its performance confirms that indeed DDFA is a viable technique for learning useful features.

PDF Details

AAMAS Conference 2010 Conference Paper

Evolving Policy Geometry for Scalable Multiagent Learning

David D'Ambrosio
Joel Lehman
Sebastian Risi
Kenneth Stanley

A major challenge for traditional approaches to multiagentlearning is to train teams that easily scale to include additional agents. The problem is that such approaches typically encode each agent's policy separately. Such separationmeans that computational complexity explodes as the number of agents in the team increases, and also leads to theproblem of reinvention: Skills that should be shared amongagents must be rediscovered separately for each agent. Toaddress this problem, this paper presents an alternative evolutionary approach to multiagent learning called multiagentHyperNEAT that encodes the team as a pattern of relatedpolicies rather than as a set of individual agents. To capturethis pattern, a policy geometry is introduced to describe therelationship between each agent's policy and its canonicalgeometric position within the team. Because policy geometry can encode variations of a shared skill across all of thepolicies it represents, the problem of reinvention is avoided. Furthermore, because the policy geometry of a particularteam can be sampled at any resolution, it acts as a heuristicfor generating policies for teams of any size, producing apowerful new capability for multiagent learning. In this paper, multiagent HyperNEAT is tested in predator-prey androom-clearing domains. In both domains the results are effective teams that can be successfully scaled to larger teamsizes without any further training.

PDF