Arrow Research search

Author name cluster

Mark Roberts

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Probabilistic Hierarchical Goal Network Planning with UCT

  • David H. Chan
  • Mark Roberts
  • Dana S. Nau

Hierarchical goal networks (HGNs) provide a framework for goal-directed planning by decomposing high-level goals into ordered subgoals. While prior work has examined non-determinism for hierarchical planning (specifically, HTNs), scant work studies how HGNs can help in stochastic settings. We introduce a formalism for probabilistic HGN planning with action-insertion semantics, enabling probabilistic planners to incorporate domain knowledge from goal decomposition methods. We design and evaluate two UCT-based algorithms for solving probabilistic HGN planning problems: an asymptotically optimal approach and a compressed, shared-value approach that optimizes separately for each goal within the goal-subgoal hierarchy. We compare our two UCT-based HGN search algorithms experimentally on modified benchmark domains from the FOND HTN literature. Our results demonstrate that on larger problems, the compressed search converges more quickly and outperforms the asymptotically optimal search. This suggests that HGNs can be effective in probabilistic planning, and compression may yield better performance on large problems in anytime settings with stochastic action outcomes.

AAMAS Conference 2025 Conference Paper

Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network

  • Vincent Hsiao
  • Mark Roberts
  • Laura M. Hiatt
  • George Konidaris
  • Dana S. Nau

A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from an SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that SEBN-based curricula frequently outperform other baselines.

ECAI Conference 2025 Conference Paper

Landmark-Assisted Monte Carlo Planning

  • David H. Chan
  • Mark Roberts
  • Dana S. Nau

Landmarks—conditions that must be satisfied at some point in every solution plan—have contributed to major advancements in classical planning, but they have seldom been used in stochastic domains. We formalize probabilistic landmarks and adapt the UCT algorithm to leverage them as subgoals to decompose MDPs; core to the adaptation is balancing between greedy landmark achievement and final goal achievement. Our results in benchmark domains show that well-chosen landmarks can significantly improve the performance of UCT in online probabilistic planning, while the best balance of greedy versus long-term goal achievement is problem-dependent. The results suggest that landmarks can provide helpful guidance for anytime algorithms solving MDPs.

AAMAS Conference 2025 Conference Paper

Uncertainty Expression for Human-Robot Task Communication

  • David Porfirio
  • Mark Roberts
  • Laura M. Hiatt

An underlying assumption of many existing approaches to humanrobot task communication is that the robot possesses a sufficient amount of environmental domain knowledge, including the locations of task-critical objects. This assumption is unrealistic if the locations of known objects change or have not yet been discovered by the robot. In this work, our key insight is that in many scenarios, robot end users possess more scene insight than the robot and need ways to express it. Presently, there is a lack of research on how solutions for collecting end-user scene insight should be designed. We thereby created an Uncertainty Expression System (UES) to investigate how best to elicit end-user scene insight. The UES allows end users to convey their knowledge of object uncertainty using either: (1) a precision interface that allows meticulous expression of scene insight; (2) a painting interface by which users create a heat map of possible object locations; and (3) a ranking interface by which end users express object locations via an ordered list. We then conducted a user study to compare the effectiveness of these approaches based on the accuracy of scene insight conveyed to the robot, the efficiency at which end users are able to express this scene insight, and both usability and task load. Results indicate that the rank interface is more user friendly and efficient than the precision interface, and that the paint interface is the least accurate.

AAAI Conference 2018 Short Paper

Comparing Reward Shaping, Visual Hints, and Curriculum Learning

  • Rey Pocius
  • David Isele
  • Mark Roberts
  • David Aha

Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum. Yet few studies examine how they compare to each other, when one might prefer one approach, or how they may complement each other. As a first step in this direction, we compare reward shaping, hints, and curricula for a Deep RL agent in the game of Minecraft. We seek to answer whether reward shaping, visual hints, or the curricula have the most impact on performance, which we measure as the time to reach the target, the distance from the target, the cumulative reward, or the number of actions taken. Our analyses show that performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, the results suggest that designing an effective curriculum and providing appropriate hints most improve the performance. Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum, yet few studies examine how they compare to each other. We compare these approaches for a Deep RL agent in the game of Minecraft and show performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, this suggests that designing an effective curriculum with hints most improve the performance.

ICAPS Conference 2017 Conference Paper

A New Approach to Temporal Planning with Rich Metric Temporal Properties

  • Son Thanh To
  • Benjamin Johnson 0003
  • Mark Roberts
  • David W. Aha

Temporal logics have been used in autonomous planning to represent and reason about temporal planning problems. However, such techniques have typically been restricted to either (1) representing actions, events, and goals with temporal properties or (2) planning for temporally-extended goals under restrictive assumptions. We introduce Mixed Propositional Metric Temporal Logic (MPMTL) where formulae are built over mixed binary and continuous real variables. We introduce a planner, MTP, that solves MPMTL problems and includes a SAT-solver, model checker for a polynomial fragment of MPMTL, and a forward search algorithm. We extend PDDL 2. 1 with MPMTL syntax to create MPDDL and an associated parser. The empirical study shows that MTP outperforms the state-of-the-art PDDL+ planner SMTPlan+ on several domains it performed best on and MTP performs and scales on problem size well for challenging domains with rich temporal properties we create.

IJCAI Conference 2016 Conference Paper

Hierarchical Planning: Relating Task and Goal Decomposition with Task Sharing

  • Ron Alford
  • Vikas Shivashankar
  • Mark Roberts
  • Jeremy Frank
  • David W. Aha

Considerable work has focused on enhancing the semantics of Hierarchical Task Networks (HTNs) in order to advance the state-of-the-art in hierarchical planning. For instance, the Hierarchical Goal Netwwork (HGN) formalism operates over a hierarchy of goals to facilitate tighter integration of decompositional planning with classical planning. Another example is the Action Notation Markup Language (ANML) which adds aspects of generative planning and task-sharing to the standard HTN semantics. The aim of this work is to formally analyze the effects of these modifications to HTN semantics on the computational complexity and expressivity of HTN planning. To facilitate analysis, we unify goal and task planning into Goal-Task Network (GTN) planning. GTN models use HTN and HGN constructs, but have a solution-preserving mapping back to HTN planning. We then show theoretical results that provide new insights into both the expressivity as well as computational complexity of GTN planning under a number of different semantics. Our work lays a firm footing to clarify exact semantics for recent planners based on ANML, HGNs, and similar hierarchical languages.

ICAPS Conference 2014 Conference Paper

Evaluating Diversity in Classical Planning

  • Mark Roberts
  • Adele E. Howe
  • Indrajit Ray

Applications that require alternative plans challenge the single solution, single quality metric assumptions upon which many classical planners are designed and evaluated. To evaluate the distinctness of alternative plans (i. e. , plan sets), researchers have created diversity metrics that often measure the set difference between the actions of plans. Many approaches for generating plan sets embed the same diversity metric in a weighted evaluation function to guide the search mechanism, thus confounding the search process with its evaluation. We discover that two diversity metrics fail to distinguish similar plans from each other or to identify plans with extraneous actions, so we introduce two new diversity metrics, \emph{uniqueness} and \emph{overlap}, to capture these cases. We then examine the tradeoffs of producing diverse plans while we control for plan length and metric interaction and confirm that metric interaction can significantly impact search performance. We show that planners searching for plan sets must consider a third metric, \emph{parsimony}, that prefers shorter plans while maximizing diversity. We evaluate three existing approaches for generating diverse plans and two new algorithms that are designed to explicitly manage diversity and interaction between the diversity and quality metrics. Our findings synthesize and extend recent results in plan diversity.

ICAPS Conference 2008 Conference Paper

What Makes Planners Predictable?

  • Mark Roberts
  • Adele E. Howe
  • Brandon Wilson
  • Marie desJardins

In recent work we showed that models constructed from planner performance data over a large suite of benchmark problems are surprisingly accurate; 91-99% accuracy for success and 3-496 seconds RMSE for runtime. In this paper, we examine the underlying causes of these accurate models. We deconstruct the learned models to assess how the features, the planners, the search space topology and the amount of training data facilitate predicting planner performance. We find that the models can be learned from relatively little training data (e. g., performance on 10% of the problems in some cases). Generally, having more features improves accuracy. However, the effect is often planner-dependent: in some cases, adding features degrades performance. We identify that the most prominent features in the models are domain features, though we find that the runtime models still have a need for better features. In the last part of the paper, we examine explanatory models to refine the planner dependencies and to identify linkages between problem structure and specific planners' performance.

ICAPS Conference 2004 Conference Paper

Trading Places: How to Schedule More in a Multi-Resource Oversubscribed Scheduling Problem

  • Laura Barbulescu
  • Adele E. Howe
  • L. Darrell Whitley
  • Mark Roberts

Oversubscribed scheduling problems require removing tasks when enough resources are not available. Prior AI approaches have mostly been constructive or repairbased heuristic search. In contrast, we have found a genetic algorithm (GA) to be the best approach to the overconstrained problem of Air Force Satellite Control Network scheduling. We present empirical results that elucidate sources of difficulty in the application and partially explain why the GA is well suited to this problem. We show that the task interaction compels changes involving many tasks simultaneously and the GA appears to be learning domain specific patterns in the data.