Arrow Research search

Author name cluster

Robert Wright

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

ICML Conference 2025 Conference Paper

Novelty Detection in Reinforcement Learning with World Models

  • Geigh Zollicoffer
  • Kenneth Eaton 0002
  • Jonathan C. Balloch
  • Julia M. Kim
  • Wei Zhou
  • Robert Wright
  • Mark O. Riedl

Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as novelties. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents by utilizing the misalignment of the world model’s hallucinated states and the true observed states as a novelty score. We provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL-focused novelty detection algorithms.

ICRA Conference 2025 Conference Paper

Privileged-Dreamer: Explicit Imagination of Privileged Information for Rapid Adaptation of Learned Policies

  • Morgan Byrd
  • Jack L. Crandell
  • Mili Das
  • Jessica Inman
  • Robert Wright
  • Sehoon Ha

Numerous real-world control problems involve dynamics and objectives affected by unobservable hidden parameters, ranging from autonomous driving to robotic manipulation, which cause performance degradation during sim-to-real transfer. To represent these kinds of domains, we adopt hiddenparameter Markov decision processes (HIP-MDPs), which model sequential decision problems where hidden variables parameterize transition and reward functions. Existing approaches, such as domain randomization, domain adaptation, and meta-learning, simply treat the effect of hidden parameters as additional variance and often struggle to effectively handle HIP-MDP problems, especially when the rewards are parameterized by hidden variables. We introduce PrivilegedDreamer, a model-based reinforcement learning framework that extends the existing model-based approach by incorporating an explicit parameter estimation module. PrivilegedDreamer features its novel dual recurrent architecture that explicitly estimates hidden parameters from limited historical data and enables us to condition the model, actor, and critic networks on these estimated parameters. Our empirical analysis on five diverse HIP-MDP tasks demonstrates that PrivilegedDreamer outperforms state-of-the-art model-based, model-free, and domain adaptation learning algorithms. Additionally, we conduct ablation studies to justify the inclusion of each component in the proposed architecture.

AAMAS Conference 2024 Conference Paper

LgTS: Dynamic Task Sampling using LLM-generated Sub-Goals for Reinforcement Learning Agents

  • Yash Shukla
  • Wenchang Gao
  • Vasanth Sarathy
  • Alvaro Velasquez
  • Robert Wright
  • Jivko Sinapov

Recent advancements in reasoning abilities of Large Language Models (LLM) has promoted their usage in problems that require highlevel planning for artificial agents. However, current techniques that utilize LLMs for such planning tasks make certain key assumptions such as, access to datasets that permit finetuning, meticulously engineered prompts that only provide relevant and essential information to the LLM, and most importantly, a deterministic approach to allow execution of the LLM responses either in the form of existing policies or plan operators. In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment. The RL agent uses Teacher-Student learning algorithm to learn a set of successful policies for reaching the goal state from the start state while simultaneously minimizing the number of environmental interactions. Unlike previous methods that utilize LLMs, our approach does not assume access to a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM. Through experiments on a gridworld based DoorKey domain and a search-and-rescue inspired domain, we show that a LLM-proposed graphical structure for sub-goals combined with a Teacher-Student RL algorithm achieves sample-efficient policies. More details at https: //llm-guided-task-sampling. github. io/

ICAPS Conference 2024 Conference Paper

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

  • Yash Shukla
  • Tanushree Burman
  • Abhishek Kulkarni
  • Robert Wright
  • Alvaro Velasquez
  • Jivko Sinapov

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

IROS Conference 2023 Conference Paper

A Framework for Few-Shot Policy Transfer Through Observation Mapping and Behavior Cloning

  • Yash Shukla
  • Bharat Kesari
  • Shivam Goel
  • Robert Wright
  • Jivko Sinapov

Despite recent progress in Reinforcement Learning for robotics applications, many tasks remain prohibitively difficult to solve because of the expensive interaction cost. Transfer learning helps reduce the training time in the target domain by transferring knowledge learned in a source domain. Sim2Real transfer helps transfer knowledge from a simulated robotic domain to a physical target domain. Knowledge transfer reduces the time required to train a task in the physical world, where the cost of interactions is high. However, most existing approaches assume exact correspondence in the task structure and the physical properties of the two domains. This work proposes a framework for Few-Shot Policy Transfer between two domains through Observation Mapping and Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain. We observe successful behavior policy transfer with limited target task interactions and in cases where the source and target task are semantically dissimilar.

ICAPS Conference 2023 Conference Paper

Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

  • Yash Shukla
  • Abhishek Kulkarni
  • Robert Wright
  • Alvaro Velasquez
  • Jivko Sinapov

Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i. e. , tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e. g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e. g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task

AAMAS Conference 2023 Conference Paper

Neuro-Symbolic World Models for Adapting to Open World Novelty

  • Jonathan C. Balloch
  • Zhiyu Lin
  • Xiangyu Peng
  • Mustafa Hussain
  • Aarun Srinivas
  • Robert Wright
  • Julia M. Kim
  • Mark O. Riedl

Most reinforcement learning (RL) methods assume that the world is a closed, fixed process, when in reality most real world problems are open, changing over time. To address this, we introduce World- Cloner, an end-to-end trainable neuro-symbolic world model that learns an efficient symbolic model of transitions and uses this world model to improve novelty adaptation. We show that the symbolic world model helps WorldCloner adapt its policy more efficiently than neural-only reinforcement learning methods.

AAAI Conference 2019 Conference Paper

Beyond Speech: Generalizing D-Vectors for Biometric Verification

  • Jacob Baldwin
  • Ryan Burnham
  • Andrew Meyer
  • Robert Dora
  • Robert Wright

Deep learning based automatic feature extraction methods have radically transformed speaker identification and facial recognition. Current approaches are typically specialized for individual domains, such as Deep Vectors (D-Vectors) for speaker identification. We provide two distinct contributions: a generalized framework for biometric verification inspired by D-Vectors and novel models that outperform current stateof-the-art approaches. Our approach supports substitution of various feature extraction models and improves the robustness of verification tests across domains. We demonstrate the framework and models for two different behavioral biometric verification problems: keystroke and mobile gait. We present a comprehensive empirical analysis comparing our framework to the state-of-the-art in both domains. Our models perform verification with higher accuracy using orders of magnitude less data than state-of-the-art approaches in both domains. We believe that the combination of high accuracy and practical data requirements will enable application of behavioral biometric models outside of the laboratory in support of much-needed improvements to cyber security.

AAAI Conference 2018 Conference Paper

Diverse Exploration for Fast and Safe Policy Improvement

  • Andrew Cohen
  • Lei Yu
  • Robert Wright

We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains. As its solution, we propose a novel exploration strategy - diverse exploration (DE), which learns and deploys a diverse set of safe policies to explore the environment. We provide DE theory explaining why diversity in behavior policies enables effective exploration without sacri- ficing exploitation. Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance.

AAAI Conference 2015 Conference Paper

Improving Approximate Value Iteration with Complex Returns by Bounding

  • Robert Wright
  • Xingye Qiao
  • Steven Loscalzo
  • Lei Yu

Approximate value iteration (AVI) is a widely used technique in reinforcement learning. Most AVI methods do not take full advantage of the sequential relationship between samples within a trajectory in deriving value estimates, due to the challenges in dealing with the inherent bias and variance in the n-step returns. We propose a bounding method which uses a negatively biased but relatively low variance estimator generated from a complex return to provide a lower bound on the observed value of a traditional one-step return estimator. In addition, we develop a new Bounded FQI algorithm, which efficiently incorporates the bounding method into an AVI framework. Experiments show that our method produces more accurate value estimates than existing approaches, resulting in improved policies.

JAAMAS Journal 2014 Journal Article

Predictive feature selection for genetic policy search

  • Steven Loscalzo
  • Robert Wright
  • Lei Yu

Abstract Automatic learning of control policies is becoming increasingly important to allow autonomous agents to operate alongside, or in place of, humans in dangerous and fast-paced situations. Reinforcement learning (RL), including genetic policy search algorithms, comprise a promising technology area capable of learning such control policies. Unfortunately, RL techniques can take prohibitively long to learn a sufficiently good control policy in environments described by many sensors (features). We argue that in many cases only a subset of available features are needed to learn the task at hand, since others may represent irrelevant or redundant information. In this work, we propose a predictive feature selection framework that analyzes data obtained during execution of a genetic policy search algorithm to identify relevant features on-line. This serves to constrain the policy search space and reduces the time needed to locate a sufficiently good policy by embedding feature selection into the process of learning a control policy. We explore this framework through an instantiation called predictive feature selection embedded in neuroevolution of augmenting topology (NEAT), or PFS-NEAT. In an empirical study, we demonstrate that PFS-NEAT is capable of enabling NEAT to successfully find good control policies in two benchmark environments, and show that it can outperform three competing feature selection algorithms, FS-NEAT, FD-NEAT, and SAFS-NEAT, in several variants of these environments.

ICAART Conference 2009 Conference Paper

State Aggregation for Reinforcement Learning using Neuroevolution

  • Robert Wright
  • Nathaniel Gemelli

In this paper, we present a new machine learning algorithm, RL-SANE, which uses a combination of neuroevolution (NE) and traditional reinforcement learning (RL) techniques to improve learning performace. RL-SANE is an innovative combination of the neuroevolutionary algorithm NEAT(Stanley, 2004) and the RL algorithm Sarsa(l)(Sutton and Barto, 1998). It uses the special ability of NEAT to generate and train customized neural networks that provide a means for reducing the size of the state space through state aggregation. Reducing the size of the state space through aggregation enables Sarsa(l) to be applied to much more difficult problems than standard tabular based approaches. Previous similar work in this area, such as in Whiteson and Stone (Whiteson and Stone, 2006) and Stanley and Miikkulainen (Stanley and Miikkulainen, 2001), have shown positive and promising results. This paper gives a brief overview of neuroevolutionary methods, introduces the RL-SANE algorithm, presents a comparative analysis of RL-SANE to other neuroevolutionary algorithms, and concludes with a discussion of enhancements that need to be made to RL-SANE.