Arrow Research search

Author name cluster

Eric Hansen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

AAAI Conference 2016 Conference Paper

General Error Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

  • Eric Hansen
  • Ibrahim Abdoulahi

We consider recently-derived error bounds that can be used to bound the quality of solutions found by heuristic search algorithms for stochastic shortest path problems. In their original form, the bounds can only be used for problems with positive action costs. We show how to generalize the bounds so that they can be used in solving any stochastic shortest path problem, regardless of cost structure. In addition, we introduce a simple new heuristic search algorithm that performs as well or better than previous algorithms for this class of problems, while being easier to implement and analyze.

AAAI Conference 2015 Conference Paper

Efficient Bounds in Heuristic Search Algorithms for Stochastic Shortest Path Problems

  • Eric Hansen
  • Ibrahim Abdoulahi

Fully observable decision-theoretic planning problems are commonly modeled as stochastic shortest path (SSP) problems. For this class of planning problems, heuristic search algorithms (including LAO*, RTDP, and related algorithms), as well as the value iteration algorithm on which they are based, lack an efficient test for convergence to an -optimal policy (except in the special case of discounting). We introduce a simple and efficient test for convergence that applies to SSP problems with positive action costs. The test can detect whether a policy is proper, that is, whether it achieves the goal state with probability 1. If proper, it gives error bounds that can be used to detect convergence to an -optimal solution. The convergence test incurs no extra overhead besides computing the Bellman residual, and the performance guarantee it provides substantially improves the utility of this class of planning algorithms.

AAAI Conference 2011 Conference Paper

Memory-Efficient Dynamic Programming for Learning Optimal Bayesian Networks

  • Brandon Malone
  • Changhe Yuan
  • Eric Hansen

We describe a memory-efficient implementation of a dynamic programming algorithm for learning the optimal structure of a Bayesian network from training data. The algorithm leverages the layered structure of the dynamic programming graphs representing the recursive decomposition of the problem to reduce the memory requirements of the algorithm from O(n2n ) to O(C(n, n/2)), where C(n, n/2) is the binomial coefficient. Experimental results show that the approach runs up to an order of magnitude faster and scales to datasets with more variables than previous approaches.

NeurIPS Conference 1997 Conference Paper

An Improved Policy Iteration Algorithm for Partially Observable MDPs

  • Eric Hansen

A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971, 1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The pa(cid: 173) per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the trans(cid: 173) formation of a finite-state controller into an improved finite-state con(cid: 173) troller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.

NeurIPS Conference 1996 Conference Paper

Reinforcement Learning for Mixed Open-loop and Closed-loop Control

  • Eric Hansen
  • Andrew Barto
  • Shlomo Zilberstein

Closed-loop control relies on sensory feedback that is usually as(cid: 173) sumed to be free. But if sensing incurs a cost, it may be cost(cid: 173) effective to take sequences of actions in open-loop mode. We de(cid: 173) scribe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Al(cid: 173) though we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the pa(cid: 173) per is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy.