Lee Spector Papers

RLC Conference 2025 Conference Paper

Pareto Optimal Learning from Preferences with Hidden Context

Ryan Bahlous-Boldi
Li Ding
Lee Spector
Scott Niekum

Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) leverages human preferences to achieve this alignment. However, when preferences are sourced from diverse populations, point estimates of reward can result in suboptimal performance or be unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which enables pluralistic alignment by framing discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes lexicase selection, an iterative process that selects diverse and Pareto-optimal solutions. Our theoretical and empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions and policies, effectively catering to distinct groups without access to group numbers or membership labels. We verify the performance of POPL on a stateless preference learning setting, a Minigrid RL domain, Metaworld robotics benchmarks, as well as large language model (LLM) fine-tuning. We illustrate that POPL can also serve as a foundation for techniques optimizing specific notions of group fairness, ensuring safe and equitable AI model alignment.

PDF Details

RLJ Journal 2025 Journal Article

Pareto Optimal Learning from Preferences with Hidden Context

Ryan Bahlous-Boldi
Li Ding
Lee Spector
Scott Niekum

Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) leverages human preferences to achieve this alignment. However, when preferences are sourced from diverse populations, point estimates of reward can result in suboptimal performance or be unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which enables pluralistic alignment by framing discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes lexicase selection, an iterative process that selects diverse and Pareto-optimal solutions. Our theoretical and empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions and policies, effectively catering to distinct groups without access to group numbers or membership labels. We verify the performance of POPL on a stateless preference learning setting, a Minigrid RL domain, Metaworld robotics benchmarks, as well as large language model (LLM) fine-tuning. We illustrate that POPL can also serve as a foundation for techniques optimizing specific notions of group fairness, ensuring safe and equitable AI model alignment.

PDF Details

ICML Conference 2024 Conference Paper

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization

Li Ding 0010
Jenny Zhang
Jeff Clune
Lee Spector
Joel Lehman

Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where easily defined performance measures are lacking. However, there are drawbacks when RLHF is commonly used to optimize for average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms in complex and open-ended domains. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of QD with manually crafted diversity metrics on standard benchmarks in robotics and reinforcement learning. Notably, in open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model and is more favorably received in user studies. We conclude by analyzing QDHF’s scalability, robustness, and quality of derived diversity metrics, emphasizing its strength in open-ended optimization tasks. Code and tutorials are available at https: //liding. info/qdhf.

Details

ICLR Conference 2022 Conference Paper

Optimizing Neural Networks with Gradient Lexicase Selection

Li Ding 0010
Lee Spector

One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead both to stagnation at local optima and to poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how the general idea of lexicase selection can fit into the context of deep learning to improve generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Experimental results show that the proposed method improves the generalization performance of various popular deep neural network architectures on three image classification benchmarks. Qualitative analysis also indicates that our method helps the networks learn more diverse representations.

Details

EAAI Journal 2016 Journal Article

Inference of compact nonlinear dynamic models by epigenetic local search

William La Cava
Kourosh Danai
Lee Spector

Details DOI

AIJ Journal 2006 Journal Article

Evolution of artificial intelligence

Lee Spector

Details DOI

AAAI Conference 1994 Conference Paper

Criticism, Culture, and the Automatic Generation of Artworks

Lee Spector

Researchers wishing to create computational systems that themselves generate artworks face two interacting challenges. The first is that the standards by which artistic output is judged are notoriously difficult to quantify. The larger AI community is currently involved in a rich internal dialogue on methodological issues, standards, and rigor, and hence murkiness with regard to the assessment of output must be faced squarely. The second challenge is that any artwork exists within an extraordinarily rich cultural and historical context, and it is rare that an artist who is ignorant of this context will produce acceptable works. In this paper we assert that these considerations argue for case-based AI/Art systems that take critical criteria as parameters. We describe an example system that produces new bebop jazz melodies from a case-base of melodies, using genetic programming techniques and a fitness function based on user-provided critical criteria. We discuss the role that such techniques may play in future work on AI and the arts.

PDF Details

AAAI Conference 1994 Conference Paper

Genetic Programming and AI Planning Systems

Lee Spector

Genetic programming (GP) is an automatic programming technique that has recently been applied to a wide range of problems including blocks-world planning. This paper describes a series of illustrative experiments in which GP techniques are applied to traditional blocks-world planning problems. We discuss genetic planning in the context of traditional AI planning systems, and comment on the costs and benefits to be expected from further work.

PDF Details

AAAI Conference 1994 Conference Paper

Ordering Relations in Human and Machine Planning

Lee Spector

Analytical results from AI planning research provide the motivation for this experimental study of ordering relationships in human planning. We examine timings of humans performing specific tasks from the AI planning literature and present evidence that normal human planners, like “state of the art” AI planning systems, use partial-order plan representations. We also describe ongoing experiments that are designed to shed light on the plan representations used by children and by adults with planning deficits due to brain damage. Several points of interest for collaboration between AI scientists and neuropsychologists are noted, as are impacts that we feel this research may have on future work in AI planning.

PDF Details

ICAPS Conference 1994 Conference Paper

The Use of Supervenience in Dynamic-world Planning

Lee Spector
James A. Hendler

Thispaperdescribesthe use of supervenience in integrating planningand reaction in complex, dynamicenvironments. Supervenience is a formof abstractionwithaffinities both to abstraction in AI planningsystemsand to partitioning schemesin hierarchical control systems. Theuse of supervenieneecanbe distilled to an easy-to-stateconstrainton the designof multileveldynamic-world p "lanningsystems: worldknowledgeup, goals down. Wepresent the supervenience architecturewhichembodies this constraint, and contrast it to thc subsumption architecture of Brooks. Wedescribe the performanceof an implementationof the supervcnience architecture on a problemin the HomeBot domain, and we concludewitha discussionof the role that supcrvenience can play in future dynamic-world planningsystems. Supervenienceis a species of abstraction that wc believe to be important for systems that must integrate high-level reasoning with real-time action. Simplification abstraction is a special case of supervenience, and the search-reduction benefits of ABSTRIPS-stylesystems are sometimesavailable in supcrvenient planning systems as well. The generality of supervenience also allows, however, for uses of abstraction similar to those available in blackboard architectures and in other multilevel control systems. The central idea of supervenienceis that representations at lower levels of abstraction are epistemologically"closer to the world"than those at higher levels, and that the representations at higher levels therefore dependon those at lower levels. The higher levels maycontain representations that axe simplifications of low-level, sensory reports, but they mayjust as well contain representations that axe complex, structurally rich aggregates that have no unified representation at lower levels. In contrast to ABSTRIPSstyle systems, in which higher levels must be simplifications of the lower levels, levels of supervenience maybe dissimilar in various ways so long as the proper dependence relation holds. The thesis is that it is this dependence, and not the more restrictive notion of simplification, that allows for the flexible integration of deliberation and reaction. The concept of supervenienccapplies naturally to multilevel computationalarchitectures in which the higher levels are coupled to the world through the lower levels. In such cases the privileged status of the lower levels (vis-avis access to the world) can be used to advantage. Wehave formalized the superveniencc relation in the context of nonmonotonic reasoning systems, using the concept of "defeasibility" in nonmonotonicsystems to spell out the appropriate notion of "dependence. " Supervenienceis defined to be the case in whichlower levels can defeat higher level facts but not vice versa (Specter 1992); this can abbreviated as "assertions up, assumptions down, "and reformulated for implementation purposes as "world knowledge up, goals down. "The bottom line for system-builders is this: Lowlevels should "know" enough to be right about, and to act upon, their assertions. Highlevels should configure (e. g., provide goals for) lowerlevels, but should not override knowledgedetermined to be true by the lower levels. Lowerlevels mayneed to monitor for goal-changes from above, but not for changes in world knowledgefrom

Details

AIJ Journal 1989 Journal Article

Minimal rationality

Lee Spector
James Hendler

Details DOI

Possible papers