Arrow Research search

Author name cluster

Ivan Gavran

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

AAAI Conference 2022 Conference Paper

Reinforcement Learning with Stochastic Reward Machines

  • Jan Corazza
  • Ivan Gavran
  • Daniel Neider

Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This algorithm can easily be paired with existing reinforcement learning algorithms for reward machines and guarantees to converge to an optimal policy in the limit. We demonstrate the effectiveness of our algorithm in two case studies and show that it outperforms both existing methods and a naive approach for handling noisy reward functions.

AAAI Conference 2021 Conference Paper

Advice-Guided Reinforcement Learning in a non-Markovian Environment

  • Daniel Neider
  • Jean-Raphael Gaglione
  • Ivan Gavran
  • Ufuk Topcu
  • Bo Wu
  • Zhe Xu

We study a class of reinforcement learning tasks in which the agent receives its reward for complex, temporally-extended behaviors sparsely. For such tasks, the problem is how to augment the state-space so as to make the reward function Markovian in an efficient way. While some existing solutions assume that the reward function is explicitly provided to the learning algorithm (e. g. , in the form of a reward machine), the others learn the reward function from the interactions with the environment, assuming no prior knowledge provided by the user. In this paper, we generalize both approaches and enable the user to give advice to the agent, representing the user’s best knowledge about the reward function, potentially fragmented, partial, or even incorrect. We formalize advice as a set of DFAs and present a reinforcement learning algorithm that takes advantage of such advice, with optimal convergence guarantee. The experiments show that using wellchosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude.

AAAI Conference 2021 Conference Paper

Choosing the Initial State for Online Replanning

  • Maximilian Fickert
  • Ivan Gavran
  • Ivan Fedotov
  • Jörg Hoffmann
  • Rupak Majumdar
  • Wheeler Ruml

The need to replan arises in many applications. However, in the context of planning as heuristic search, it raises an annoying problem: if the previous plan is still executing, what should the new plan search take as its initial state? If it were possible to accurately predict how long replanning would take, it would be easy to find the appropriate state at which control will transfer from the previous plan to the new one. But as planning problems can vary enormously in their difficulty, this prediction can be challenging. Many current systems merely use a manually chosen constant duration. In this paper, we show how such ad hoc solutions can be avoided by integrating the choice of the appropriate initial state into the search process itself. The search is initialized with multiple candidate initial states and a time-aware evaluation function is used to prefer plans whose total goal achievement time is minimal. Experimental results show that this approach yields better behavior than either guessing a constant or trying to predict replanning time in advance. By making replanning more effective and easier to implement, this work aids in creating planning systems that can better handle the inevitable exigencies of real-world execution.

ICAPS Conference 2020 Conference Paper

Joint Inference of Reward Machines and Policies for Reinforcement Learning

  • Zhe Xu 0005
  • Ivan Gavran
  • Yousef Ahmad
  • Rupak Majumdar
  • Daniel Neider
  • Ufuk Topcu
  • Bo Wu 0005

Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis reward machine and a sample of RL episodes. It uses a separate q-function defined for each state of the current hypothesis reward machine to determine the policy and performs RL to update the q-functions. While performing RL, the algorithm updates the sample by adding RL episodes along which the obtained rewards are inconsistent with the rewards based on the current hypothesis reward machine. In the next iteration, the algorithm infers a new hypothesis reward machine from the updated sample. Based on an equivalence relation between states of reward machines, we transfer the q-functions between the hypothesis reward machines in consecutive iterations. We prove that the proposed algorithm converges almost surely to an optimal policy in the limit. The experiments show that learning high-level knowledge in the form of reward machines leads to fast convergence to optimal policies in RL, while the baseline RL methods fail to converge to optimal policies after a substantial number of training steps.