Author name cluster

Joel Lehman

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

ICLR Conference 2024 Conference Paper

OMNI: Open-endedness via Models of human Notions of Interestingness

Jenny Zhang
Joel Lehman
Kenneth O. Stanley
Jeff Clune

Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.

Details

ICML Conference 2024 Conference Paper

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization

Li Ding 0010
Jenny Zhang
Jeff Clune
Lee Spector
Joel Lehman

Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where easily defined performance measures are lacking. However, there are drawbacks when RLHF is commonly used to optimize for average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms in complex and open-ended domains. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of QD with manually crafted diversity metrics on standard benchmarks in robotics and reinforcement learning. Notably, in open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model and is more favorably received in user studies. We conclude by analyzing QDHF’s scalability, robustness, and quality of derived diversity metrics, emphasizing its strength in open-ended optimization tasks. Code and tutorials are available at https: //liding. info/qdhf.

Details

ICLR Conference 2024 Conference Paper

Quality-Diversity through AI Feedback

Herbie Bradley
Andrew Dai 0001
Hannah Benita Teufel
Jenny Zhang
Koen Oostermeijer
Marco Bellagente
Jeff Clune
Kenneth O. Stanley

In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through \emph{AI feedback}, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

Details

ICML Conference 2021 Conference Paper

Reinforcement Learning Under Moral Uncertainty

Adrien Ecoffet
Joel Lehman

An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed, e. g. fully autonomous vehicles will encounter charged moral decisions that complicate their deployment. While ethical agents could be trained by rewarding correct behavior under a specific moral theory (e. g. utilitarianism), there remains widespread disagreement about the nature of morality. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i. e. to take into account when acting that one’s credence is split across several plausible ethical theories. This paper translates such insights to the field of reinforcement learning, proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty. The results illustrate (1) how such uncertainty can help curb extreme behavior from commitment to single theories and (2) several technical complications arising from attempting to ground moral philosophy in RL (e. g. how can a principled trade-off between two competing but incomparable reward functions be reached). The aim is to catalyze progress towards morally-competent agents and highlight the potential of RL to contribute towards the computational grounding of moral philosophy.

Details

ICML Conference 2020 Conference Paper

Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Rui Wang 0052
Joel Lehman
Aditya Rawal
Jiale Zhi
Yulun Li
Jeff Clune
Kenneth O. Stanley

Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.

Details

ICML Conference 2020 Conference Paper

Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Felipe Petroski Such
Aditya Rawal
Joel Lehman
Kenneth O. Stanley
Jeff Clune

This paper investigates the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula in order to help AI agents rapidly learn. We show that such algorithms are possible via Generative Teaching Networks (GTNs), a general approach that is, in theory, applicable to supervised, unsupervised, and reinforcement learning, although our experiments only focus on the supervised case. GTNs are deep neural networks that generate data and/or training environments that a learner (e. g. a freshly initialized neural network) trains on for a few SGD steps before being tested on a target task. We then differentiate \emph{through the entire learning process} via meta-gradients to update the GTN parameters to improve performance on the target task. This paper introduces GTNs, discusses their potential, and showcases that they can substantially accelerate learning. We also demonstrate a practical and exciting application of GTNs: accelerating the evaluation of candidate architectures for neural architecture search (NAS). GTN-NAS improves the NAS state of the art, finding higher performing architectures when controlling for the search proposal mechanism. GTN-NAS also is competitive with the overall state of the art approaches, which achieve top performance while using orders of magnitude less computation than typical NAS methods. Speculating forward, GTNs may represent a first step toward the ambitious goal of algorithms that generate their own training data and, in doing so, open a variety of interesting new research questions and directions.

Details

ECAI Conference 2020 Conference Paper

Learning to Continually Learn

Shawn Beaulieu
Lapo Frati
Thomas Miconi
Joel Lehman
Kenneth O. Stanley
Jeff Clune
Nick Cheney

Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manually-designed solutions to the problem. We instead advocate meta-learning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated Meta-Learning Algorithm (ANML). It differentiates through a sequential learning process to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i. e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces state-of-the-art continual learning performance, sequentially learning as many as 600 classes (over 9, 000 SGD updates).

Details

IJCAI Conference 2019 Conference Paper

An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents

Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Pablo Samuel Castro
Yulun Li
Jiale Zhi
Ludwig Schubert

Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.

PDF Details

UAI Conference 2019 Conference Paper

Learning Belief Representations for Imitation Learning in POMDPs

Tanmay Gangwani
Joel Lehman
Qiang Liu 0001
Jian Peng 0001

We consider the problem of imitation learning from expert demonstrations in partially observable Markov decision processes (POMDPs). Belief representations, which characterize the distribution over the latent states in a POMDP, have been modeled using recurrent neural networks and probabilistic latent variable models, and shown to be effective for reinforcement learning in POMDPs. In this work, we investigate the belief representation learning problem for generative adversarial imitation learning in POMDPs. Instead of training the belief module and the policy separately as suggested in prior work, we learn the belief module jointly with the policy, using a task-aware imitation loss to ensure that the representation is more aligned with the policy’s objective. To improve robustness of representation, we introduce several informative belief regularization techniques, including multi-step prediction of dynamics and action-sequences. Evaluated on various partially observable continuous-control locomotion tasks, our belief-module imitation learning approach (BMIL) substantially outperforms several baselines, including the original GAIL algorithm and the task-agnostic belief learning algorithm. Extensive ablation analysis indicates the effectiveness of task-aware belief learning and belief regularization. Code for the project is available online.

Details

NeurIPS Conference 2018 Conference Paper

An intriguing failing of convolutional neural networks and the CoordConv solution

Rosanne Liu
Joel Lehman
Piero Molino
Felipe Petroski Such
Eric Frank
Alex Sergeev
Jason Yosinski

Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x, y) Cartesian space and coordinates in one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10--100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the Reinforcement Learning (RL) domain agents playing Atari games benefit significantly from the use of CoordConv layers.

PDF Details

NeurIPS Conference 2018 Conference Paper

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Edoardo Conti
Vashisht Madhavan
Felipe Petroski Such
Joel Lehman
Kenneth Stanley
Jeff Clune

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e. g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i. e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES.

PDF Details

IS Journal 2014 Journal Article

An Anarchy of Methods: Current Trends in How Intelligence Is Abstracted in AI

Joel Lehman
Jeff Clune
Sebastian Risi

Artificial intelligence (AI) is a sprawling field encompassing a diversity of approaches to machine intelligence and disparate perspectives on how intelligence should be viewed. Because researchers often engage only within their own specialized area of AI, there are many interesting broad questions about AI as a whole that often go unanswered. How should intelligence be abstracted in AI research? Which subfields, techniques, and abstractions are most promising? Why do researchers bet their careers on the particular abstractions and techniques of their chosen subfield of AI? Should AI research be "bio-inspired" and remain faithful to the process that produced intelligence (evolution) or the biological substrate that enables it (networks of neurons)? Discussing these big-picture questions motivated us to organize an AAAI Fall Symposium, which gathered participants across AI subfields to present and debate their views. This article distills the resulting insights.

Details DOI

IROS Conference 2011 Conference Paper

Task switching in multirobot learning through indirect encoding

David B. D'Ambrosio
Joel Lehman
Sebastian Risi
Kenneth O. Stanley

Multirobot domains are a challenge for learning algorithms because they require robots to learn to cooperate to achieve a common goal. The challenge only becomes greater when robots must perform heterogeneous tasks to reach that goal. Multiagent HyperNEAT is a neuroevolutionary method (i. e. a method that evolves neural networks) that has proven successful in several cooperative multiagent domains by exploiting the concept of policy geometry, which means the policies of team members are learned as a function of how they relate to each other based on canonical starting positions. This paper extends the multiagent HyperNEAT algorithm by introducing situational policy geometry, which allows each agent to encode multiple policies that can be switched depending on the agent's state. This concept is demonstrated both in simulation and in real Khepera III robots in a patrol and return task, where robots must cooperate to cover an area and return home when called. Robot teams that are trained with situational policy geometry are compared to teams that are not and shown to find solutions more consistently that are also able to transfer to the real world.

Details

AAMAS Conference 2010 Conference Paper

Evolving Policy Geometry for Scalable Multiagent Learning

David D'Ambrosio
Joel Lehman
Sebastian Risi
Kenneth Stanley

A major challenge for traditional approaches to multiagentlearning is to train teams that easily scale to include additional agents. The problem is that such approaches typically encode each agent's policy separately. Such separationmeans that computational complexity explodes as the number of agents in the team increases, and also leads to theproblem of reinvention: Skills that should be shared amongagents must be rediscovered separately for each agent. Toaddress this problem, this paper presents an alternative evolutionary approach to multiagent learning called multiagentHyperNEAT that encodes the team as a pattern of relatedpolicies rather than as a set of individual agents. To capturethis pattern, a policy geometry is introduced to describe therelationship between each agent's policy and its canonicalgeometric position within the team. Because policy geometry can encode variations of a shared skill across all of thepolicies it represents, the problem of reinvention is avoided. Furthermore, because the policy geometry of a particularteam can be sampled at any resolution, it acts as a heuristicfor generating policies for teams of any size, producing apowerful new capability for multiagent learning. In this paper, multiagent HyperNEAT is tested in predator-prey androom-clearing domains. In both domains the results are effective teams that can be successfully scaled to larger teamsizes without any further training.

PDF