Author name cluster

Kenneth O. Stanley

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

ICLR Conference 2024 Conference Paper

OMNI: Open-endedness via Models of human Notions of Interestingness

Jenny Zhang
Joel Lehman
Kenneth O. Stanley
Jeff Clune

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

Details

ICLR Conference 2024 Conference Paper

Quality-Diversity through AI Feedback

Herbie Bradley
Andrew Dai 0001
Hannah Benita Teufel
Jenny Zhang
Koen Oostermeijer
Marco Bellagente
Jeff Clune
Kenneth O. Stanley

In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through \emph{AI feedback}, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

Details

AAAI Conference 2021 Conference Paper

Deep Innovation Protection: Confronting the Credit Assignment Problem in Training Heterogeneous Neural Architectures

Sebastian Risi
Kenneth O. Stanley

Deep reinforcement learning approaches have shown impressive results in a variety of different domains, however, more complex heterogeneous architectures such as world models require the different neural components to be trained separately instead of end-to-end. While a simple genetic algorithm recently showed end-to-end training is possible, it failed to solve a more complex 3D task. This paper presents a method called Deep Innovation Protection (DIP) that addresses the credit assignment problem in training complex heterogenous neural network models end-to-end for such environments. The main idea behind the approach is to employ multiobjective optimization to temporally reduce the selection pressure on specific components in multi-component network, allowing other components to adapt. We investigate the emergent representations of these evolved networks, which learn to predict properties important for the survival of the agent, without the need for a specific forward-prediction loss.

PDF Details

ICML Conference 2020 Conference Paper

Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Rui Wang 0052
Joel Lehman
Aditya Rawal
Jiale Zhi
Yulun Li
Jeff Clune
Kenneth O. Stanley

Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.

Details

ICML Conference 2020 Conference Paper

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Felipe Petroski Such
Aditya Rawal
Joel Lehman
Kenneth O. Stanley
Jeff Clune

This paper investigates the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula in order to help AI agents rapidly learn. We show that such algorithms are possible via Generative Teaching Networks (GTNs), a general approach that is, in theory, applicable to supervised, unsupervised, and reinforcement learning, although our experiments only focus on the supervised case. GTNs are deep neural networks that generate data and/or training environments that a learner (e. g. a freshly initialized neural network) trains on for a few SGD steps before being tested on a target task. We then differentiate \emph{through the entire learning process} via meta-gradients to update the GTN parameters to improve performance on the target task. This paper introduces GTNs, discusses their potential, and showcases that they can substantially accelerate learning. We also demonstrate a practical and exciting application of GTNs: accelerating the evaluation of candidate architectures for neural architecture search (NAS). GTN-NAS improves the NAS state of the art, finding higher performing architectures when controlling for the search proposal mechanism. GTN-NAS also is competitive with the overall state of the art approaches, which achieve top performance while using orders of magnitude less computation than typical NAS methods. Speculating forward, GTNs may represent a first step toward the ambitious goal of algorithms that generate their own training data and, in doing so, open a variety of interesting new research questions and directions.

Details

ECAI Conference 2020 Conference Paper

Learning to Continually Learn

Shawn Beaulieu
Lapo Frati
Thomas Miconi
Joel Lehman
Kenneth O. Stanley
Jeff Clune
Nick Cheney

Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manually-designed solutions to the problem. We instead advocate meta-learning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated Meta-Learning Algorithm (ANML). It differentiates through a sequential learning process to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i. e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces state-of-the-art continual learning performance, sequentially learning as many as 600 classes (over 9, 000 SGD updates).

Details

ICML Conference 2018 Conference Paper

Differentiable plasticity: training plastic neural networks with backpropagation

Thomas Miconi
Kenneth O. Stanley
Jeff Clune

How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning. We show that plasticity, just like connection weights, can be optimized by gradient descent in large (millions of parameters) recurrent networks with Hebbian plastic connections. First, recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel, high-dimensional (1000+ pixels) natural images not seen during training. Crucially, traditional non-plastic recurrent networks fail to solve this task. Furthermore, trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task, with competitive results and little parameter overhead. Finally, in reinforcement learning settings, plastic networks outperform non-plastic equivalent in a maze exploration task. We conclude that differentiable plasticity may provide a powerful novel approach to the learning-to-learn problem.

Details

EAAI Journal 2011 Journal Article

Pareto-based evolutionary computational approach for wireless sensor placement

Shafaq B. Chaudhry
Victor C. Hung
Ratan K. Guha
Kenneth O. Stanley

Wireless sensor networks (WSNs) have become increasingly appealing in recent years for the purpose of data acquisition, surveillance, event monitoring, etc. Optimal positioning of wireless sensor nodes is an important issue for small networks of relatively expensive sensing devices. For such networks, the placement problem requires that multiple objectives be met. These objectives are usually conflicting, e. g. achieving maximum coverage and maximum connectivity while minimizing the network energy cost. A flexible algorithm for sensor placement (FLEX) is presented that uses an evolutionary computational approach to solve this multiobjective sensor placement optimization problem when the number of sensor nodes is not fixed and the maximum number of nodes is not known a priori. FLEX starts with an initial population of simple WSNs and complexifies their topologies over generations. It keeps track of new genes through historical markings, which are used in later generations to assess two networks’ compatibility and also to align genes during crossover. It uses Pareto-dominance to approach Pareto-optimal layouts with respect to the objectives. Speciation is employed to aid the survival of gene innovations and facilitate networks to compete with similar networks. Elitism ensures that the best solutions are carried over to the next generation. The flexibility of the algorithm is illustrated by solving the device/node placement problem for different applications like facility surveillance, coverage with and without obstacles, preferential surveillance, and forming a clustering hierarchy.

Details DOI

IROS Conference 2011 Conference Paper

Task switching in multirobot learning through indirect encoding

David B. D'Ambrosio
Joel Lehman
Sebastian Risi
Kenneth O. Stanley

Multirobot domains are a challenge for learning algorithms because they require robots to learn to cooperate to achieve a common goal. The challenge only becomes greater when robots must perform heterogeneous tasks to reach that goal. Multiagent HyperNEAT is a neuroevolutionary method (i. e. a method that evolves neural networks) that has proven successful in several cooperative multiagent domains by exploiting the concept of policy geometry, which means the policies of team members are learned as a function of how they relate to each other based on canonical starting positions. This paper extends the multiagent HyperNEAT algorithm by introducing situational policy geometry, which allows each agent to encode multiple policies that can be switched depending on the agent's state. This concept is demonstrated both in simulation and in real Khepera III robots in a patrol and return task, where robots must cooperate to cover an area and return home when called. Robot teams that are trained with situational policy geometry are compared to teams that are not and shown to find solutions more consistently that are also able to transfer to the real world.

Details

JMLR Journal 2010 Journal Article

Evolving Static Representations for Task Transfer

Phillip Verbancsics
Kenneth O. Stanley

An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Previous approaches to transfer in Keepaway have focused on transforming the original representation to fit the new task. In contrast, this paper explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To demonstrate this point, a bird's eye view (BEV) representation is introduced that can represent different tasks on the same two-dimensional map. For example, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV. Yet the problem is that a raw two-dimensional map is high-dimensional and unstructured. This paper shows how this problem is addressed naturally by an idea from evolutionary computation called indirect encoding, which compresses the representation by exploiting its geometry. The result is that the BEV learns a Keepaway policy that transfers without further learning or manipulation. It also facilitates transferring knowledge learned in a different domain, Knight Joust, into Keepaway. Finally, the indirect encoding of the BEV means that its geometry can be changed without altering the solution. Thus static representations facilitate several kinds of transfer. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

PDF Details

AAAI Conference 2006 Conference Paper

Real-Time Evolution of Neural Networks in the NERO Video Game

Kenneth O. Stanley
Igor Karpov

A major goal for AI is to allow users to interact with agents that learn in real time, making new kinds of interactive simulations, training applications, and digital entertainment possible. This paper describes such a learning technology, called real-time NeuroEvolution of Augmenting Topologies (rtNEAT), and describes how rtNEAT was used to build the NeuroEvolving Robotic Operatives (NERO) video game. This game represents a new genre of machine learning games where the player trains agents in real time to perform challenging tasks in a virtual environment. Providing laymen the capability to effectively train agents in real time with no prior knowledge of AI or machine learning has broad implications, both in promoting the field of AI and making its achievements accessible to the public at large.

PDF Details

AAAI Conference 2006 System Paper

Real-Time Interactive Learning in the NERO Video Game

Kenneth O. Stanley
Risto Miikkulainen

In the NeuroEvolving Robotic Operatives (NERO) video game, the player trains a team of virtual robots for combat against other players’ teams. The virtual robots learn in real time through interacting with the player. Since NERO was originally released in June, 2005, it has been downloaded over 50,000 times, appeared on Slashdot, and won several honors. The realtime NeuroEvolution of Augmenting Topologies (rtNEAT) method, which can evolve increasingly complex artificial neural networks in real time as a game is being played, drives the robots’ learning, making possible this entirely new genre of video game. The live demo will show how agents in NERO adapt in real time as they interact with the player. In the future, rtNEAT may allow new kinds of educational and training applications through interactive and adapting games.

PDF Details