Arrow Research search

Author name cluster

Ryan Sullivan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2025 Conference Paper

Robust Multi-Objective Preference Alignment with Online DPO

  • Raghav Gupta
  • Ryan Sullivan
  • Yunxuan Li
  • Samrat Phatale
  • Abhinav Rastogi

Multi-objective preference alignment of large language models (LLMs) is critical for developing AI systems that are more configurable, personalizable, helpful, and safe. However, optimizing model outputs to satisfy diverse objectives with variable weights at inference time for truly personalized models presents a significant challenge. Existing approaches are either computationally expensive to train or do not sufficiently steer model behaviors. This paper introduces the Multi-Objective Online DPO (MO-ODPO) algorithm, designed to robustly and efficiently align model behaviors with multiple, potentially conflicting human preferences. Our approach incorporates a prompt conditioning mechanism, allowing us to train a single preference-conditional policy, that can adapt to new preference combinations at inference. Experiments on two popular benchmarks show that MO-ODPO Pareto-dominates existing baselines while providing excellent inference-time steerability between diverse objectives.

RLJ Journal 2025 Journal Article

Syllabus: Portable Curricula for Reinforcement Learning Agents

  • Ryan Sullivan
  • Ryan Pégoud
  • Ameen Ur Rehman
  • Xinchen Yang
  • Junyun Huang
  • Aayush Verma
  • Nistha Mitra
  • John P Dickerson

Curriculum learning has been a quiet, yet crucial component of many high-profile successes of reinforcement learning. Despite this, it is still a niche topic that is not directly supported by any of the major reinforcement learning libraries. These methods can improve the capabilities and generalization of RL agents, but often require complex changes to training code. We introduce Syllabus, a portable curriculum learning library, as a solution to this problem. Syllabus provides a universal API for curriculum learning, modular implementations of popular automatic curriculum learning methods, and infrastructure that allows them to be easily integrated with asynchronous training code in nearly any RL library. Syllabus provides a minimal API for core curriculum learning components, making it easier to design new algorithms and adapt existing ones to new environments. We demonstrate this by evaluating the algorithms in Syllabus on several new environments, each using agents written in a different RL library. We present the first examples of automatic curriculum learning in NetHack and Neural MMO, two of the most challenging RL benchmarks, and find evidence that existing methods do not easily transfer to new environments.

RLC Conference 2025 Conference Paper

Syllabus: Portable Curricula for Reinforcement Learning Agents

  • Ryan Sullivan
  • Ryan Pégoud
  • Ameen Ur Rehman
  • Xinchen Yang
  • Junyun Huang
  • Aayush Verma
  • Nistha Mitra
  • John P Dickerson

Curriculum learning has been a quiet, yet crucial component of many high-profile successes of reinforcement learning. Despite this, it is still a niche topic that is not directly supported by any of the major reinforcement learning libraries. These methods can improve the capabilities and generalization of RL agents, but often require complex changes to training code. We introduce Syllabus, a portable curriculum learning library, as a solution to this problem. Syllabus provides a universal API for curriculum learning, modular implementations of popular automatic curriculum learning methods, and infrastructure that allows them to be easily integrated with asynchronous training code in nearly any RL library. Syllabus provides a minimal API for core curriculum learning components, making it easier to design new algorithms and adapt existing ones to new environments. We demonstrate this by evaluating the algorithms in Syllabus on several new environments, each using agents written in a different RL library. We present the first examples of automatic curriculum learning in NetHack and Neural MMO, two of the most challenging RL benchmarks, and find evidence that existing methods do not easily transfer to new environments.

NeurIPS Conference 2023 Conference Paper

Gradient Informed Proximal Policy Optimization

  • Sanghyun Son
  • Laura Zheng
  • Ryan Sullivan
  • Yi-Ling Qiao
  • Ming Lin

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an α-policy that stands as a locally superior policy. By adaptively modifying the α value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https: //github. com/SonSang/gippo.

NeurIPS Conference 2023 Conference Paper

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

  • Joseph Suarez
  • David Bloomin
  • Kyoung Whan Choe
  • Hao Xiang Li
  • Ryan Sullivan
  • Nishaanth Kanna
  • Daniel Scott
  • Rose Shuman

Neural MMO 2. 0 is a massively multi-agent and multi-task environment for reinforcement learning research. This version features a novel task-system that broadens the range of training settings and poses a new challenge in generalization: evaluation on and against tasks, maps, and opponents never seen during training. Maps are procedurally generated with 128 agents in the standard setting and 1-1024 supported overall. Version 2. 0 is a complete rewrite of its predecessor with three-fold improved performance, effectively addressing simulation bottlenecks in online training. Enhancements to compatibility enable training with standard reinforcement learning frameworks designed for much simpler environments. Neural MMO 2. 0 is free and open-source with comprehensive documentation available at neuralmmo. github. io and an active community Discord. To spark initial research on this new platform, we are concurrently running a competition at NeurIPS 2023.

NeurIPS Conference 2023 Conference Paper

Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks

  • Ryan Sullivan
  • Akarsh Kumar
  • Shengyi Huang
  • John Dickerson
  • Joseph Suarez

Most reinforcement learning methods rely heavily on dense, well-normalized environment rewards. DreamerV3 recently introduced a model-based method with a number of tricks that mitigate these limitations, achieving state-of-the-art on a wide range of benchmarks with a single set of hyperparameters. This result sparked discussion about the generality of the tricks, since they appear to be applicable to other reinforcement learning algorithms. Our work applies DreamerV3's tricks to PPO and is the first such empirical study outside of the original work. Surprisingly, we find that the tricks presented do not transfer as general improvements to PPO. We use a high quality PPO reference implementation and present extensive ablation studies totaling over 10, 000 A100 hours on the Arcade Learning Environment and the DeepMind Control Suite. Though our experiments demonstrate that these tricks do not generally outperform PPO, we identify cases where they succeed and offer insight into the relationship between the implementation tricks. In particular, PPO with these tricks performs comparably to PPO on Atari games with reward clipping and significantly outperforms PPO without reward clipping.

ICML Conference 2022 Conference Paper

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

  • Ryan Sullivan
  • J. K. Terry 0001
  • Benjamin Black
  • John Dickerson 0001

Visualizing optimization landscapes has resulted in many fundamental insights in numeric optimization, specifically regarding novel improvements to optimization techniques. However, visualizations of the objective that reinforcement learning optimizes (the "reward surface") have only ever been generated for a small number of narrow contexts. This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. We also explore reward surfaces in the policy gradient direction and show for the first time that many popular reinforcement learning environments have frequent "cliffs" (sudden large drops in expected reward). We demonstrate that A2C often "dives off" these cliffs into low reward regions of the parameter space while PPO avoids them, confirming a popular intuition for PPO’s improved performance over previous methods. We additionally introduce a highly extensible library that allows researchers to easily generate these visualizations in the future. Our findings provide new intuition to explain the successes and failures of modern RL methods, and our visualizations concretely characterize several failure modes of reinforcement learning agents in novel ways.

NeurIPS Conference 2021 Conference Paper

PettingZoo: Gym for Multi-Agent Reinforcement Learning

  • J K. Terry
  • Benjamin Black
  • Nathaniel Grammel
  • Mario Jayakumar
  • Ananth Hari
  • Ryan Sullivan
  • Luis S Santos
  • Clemens Dieffendahl

This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ( "AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ( "MARL"), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement learning. PettingZoo's API, while inheriting many features of Gym, is unique amongst MARL APIs in that it's based around the novel AEC games model. We argue, in part through case studies on major problems in popular MARL environments, that the popular game models are poor conceptual models of the games commonly used with MARL, that they promote severe bugs that are hard to detect, and that the AEC games model addresses these problems.

ICRA Conference 1998 Conference Paper

Active Laser Radar for High Performance Measurements

  • John A. Hancock
  • Dirk Langer
  • Martial Hebert
  • Ryan Sullivan
  • Darin Ingimarson
  • Eric Hoffmann
  • Markus Mettenleiter
  • Christoph Fröhlich

Laser scanners, or laser radars (ladar), have been used for a number of years for mobile robot navigation and inspection tasks. Although previous scanners were sufficient for low speed applications, they often did not have the range or angular resolution necessary for mapping at the long distances. Many also did not provide an ample field of view with high accuracy and high precision. In this paper we will present the development of state-of-the-art, high speed, high accuracy, 3D laser radar technology. This work has been a joint effort between CMU and K2T and Z+F. The scanner mechanism provides an unobstructed 360/spl deg/ horizontal field of view, and a 70/spl deg/ vertical field of view. Resolution of the scanner is variable with a maximum resolution of approximately 0. 06 degrees per pixel in both azimuth and elevation. The laser is amplitude-modulated, continuous-wave with an ambiguity interval of 52 m, a range resolution of 1. 6 mm, and a maximum pixel rate of 625 kHz. This paper will focus on the design and performance of the laser radar and will discuss several potential applications for the technology. It reports on performance data of the system including noise, drift over time, precision, and accuracy with measurements. Influences of ambient light, surface material of the target and ambient temperature for range accuracy are discussed. Example data of applications will be shown and improvements will also be discussed.