Arrow Research search

Author name cluster

Malcolm J. A. Strens

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

ICAPS Conference 2006 Conference Paper

Combining Stochastic Task Models with Reinforcement Learning for Dynamic Scheduling

  • Malcolm J. A. Strens

We view dynamic scheduling as a sequential decision problem. Firstly, we introduce a generalized planning operator, the stochastic task model (STM), which predicts the effects of executing a particular task on state, time and reward using a general procedural format (pure stochastic function). Secondly, we show that effective planning under uncertainty can be obtained by combining adaptive horizon stochastic planning with reinforcement learning (RL) in a hybrid system. The benefits of the hybrid approach are evaluated using a repeatable job shop scheduling task.

JMLR Journal 2002 Journal Article

Policy Search using Paired Comparisons

  • Malcolm J. A. Strens
  • Andrew W. Moore

Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the 'overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.