James E. Kostas Papers

RLC Conference 2024 Conference Paper

The Cliff of Overcommitment with Policy Gradient Step Sizes

Scott M. Jordan
Samuel Neumann
James E. Kostas
Adam White
Philip S. Thomas

Policy gradient methods form the basis for many successful reinforcement learning algorithms, but their success depends heavily on selecting an appropriate step size and many other hyperparameters. While many adaptive step size methods exist, none are both free of hyperparameter tuning and able to converge quickly to an optimal policy. It is unclear why these methods are insufficient, so we aim to uncover what needs to be addressed to make an effective adaptive step size for policy gradient methods. Through extensive empirical investigation, the results reveal that when the step size is above optimal, the policy overcommits to sub-optimal actions leading to longer training times. These findings suggest the need for a new kind of policy optimization that can prevent or recover from entropy collapses.

PDF Details

RLJ Journal 2024 Journal Article

The Cliff of Overcommitment with Policy Gradient Step Sizes

Scott M. Jordan
Samuel Neumann
James E. Kostas
Adam White
Philip S. Thomas

Policy gradient methods form the basis for many successful reinforcement learning algorithms, but their success depends heavily on selecting an appropriate step size and many other hyperparameters. While many adaptive step size methods exist, none are both free of hyperparameter tuning and able to converge quickly to an optimal policy. It is unclear why these methods are insufficient, so we aim to uncover what needs to be addressed to make an effective adaptive step size for policy gradient methods. Through extensive empirical investigation, the results reveal that when the step size is above optimal, the policy overcommits to sub-optimal actions leading to longer training times. These findings suggest the need for a new kind of policy optimization that can prevent or recover from entropy collapses.

PDF Details

ICML Conference 2021 Conference Paper

High Confidence Generalization for Reinforcement Learning

James E. Kostas
Yash Chandak
Scott M. Jordan
Georgios Theocharous
Philip S. Thomas

We present several classes of reinforcement learning algorithms that safely generalize to Markov decision processes (MDPs) not seen during training. Specifically, we study the setting in which some set of MDPs is accessible for training. The goal is to generalize safely to MDPs that are sampled from the same distribution, but which may not be in the set accessible for training. For various definitions of safety, our algorithms give probabilistic guarantees that agents can safely generalize to MDPs that are sampled from the same distribution but are not necessarily in the training set. These algorithms are a type of Seldonian algorithm (Thomas et al. , 2019), which is a class of machine learning algorithms that return models with probabilistic safety guarantees for user-specified definitions of safety.

Details

ICML Conference 2020 Conference Paper

Asynchronous Coagent Networks

James E. Kostas
Chris Nota
Philip S. Thomas

Coagent policy gradient algorithms (CPGAs) are reinforcement learning algorithms for training a class of stochastic neural networks called coagent networks. In this work, we prove that CPGAs converge to locally optimal policies. Additionally, we extend prior theory to encompass asynchronous and recurrent coagent networks. These extensions facilitate the straightforward design and analysis of hierarchical reinforcement learning algorithms like the option-critic, and eliminate the need for complex derivations of customized learning rules for these algorithms.

Details

ICML Conference 2019 Conference Paper

Learning Action Representations for Reinforcement Learning

Yash Chandak
Georgios Theocharous
James E. Kostas
Scott M. Jordan
Philip S. Thomas

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

Details

Possible papers

The Cliff of Overcommitment with Policy Gradient Step Sizes

The Cliff of Overcommitment with Policy Gradient Step Sizes

High Confidence Generalization for Reinforcement Learning

Asynchronous Coagent Networks

Learning Action Representations for Reinforcement Learning