Arrow Research search

Author name cluster

Nick Hawes

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

64 papers
2 author rows

Possible papers

64

AAAI Conference 2026 Conference Paper

Scalable Solution Methods for Dec-POMDPs with Deterministic Dynamics

  • Yang You
  • Alex Schutz
  • Zhikun Li
  • Bruno Lacerda
  • Robert Skilton
  • Nick Hawes

Many high-level multi-agent planning problems, such as multi-robot navigation and path planning, can be modeled with deterministic actions and observations. In this work, we focus on such domains and introduce the class of Deterministic Decentralized POMDPs (Det-Dec-POMDPs)—a subclass of Dec-POMDPs with deterministic transitions and observations given the state and joint actions. We then propose a practical solver, Iterative Deterministic POMDP Planning (IDPP), based on the classic Joint Equilibrium Search for Policies framework, specifically optimized to handle large-scale Det-Dec-POMDPs that existing Dec-POMDP solvers cannot handle efficiently.

IJCAI Conference 2025 Conference Paper

A Finite-State Controller Based Offline Solver for Deterministic POMDPs

  • Alex Schutz
  • Yang You
  • Matías Mattamala
  • Ipek Caliskanelli
  • Bruno Lacerda
  • Nick Hawes

Deterministic partially observable Markov decision processes (DetPOMDPs) often arise in planning problems where the agent is uncertain about its environmental state but can act and observe deterministically. In this paper, we propose DetMCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for DetPOMDPs, which builds policies in the form of finite-state controllers (FSCs). DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs. We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.

IROS Conference 2025 Conference Paper

Decremental Dynamics Planning for Robot Navigation

  • Yuanjie Lu
  • Tong Xu
  • Linji Wang
  • Nick Hawes
  • Xuesu Xiao

Most, if not all, robot navigation systems employ a decomposed planning framework that includes global and local planning. To trade-off onboard computation and plan quality, current systems have to limit all robot dynamics considerations only within the local planner, while leveraging an extremely simplified robot representation (e. g. , a point-mass holonomic model without dynamics) in the global level. However, such an artificial decomposition based on either full or zero consideration of robot dynamics can lead to gaps between the two levels, e. g. , a global path based on a holonomic point-mass model may not be realizable by a non-holonomic robot, especially in highly constrained obstacle environments. Motivated by such a limitation, we propose a novel paradigm, Decremental Dynamics Planning (DDP) 1, that integrates dynamic constraints into the entire planning process, with a focus on high-fidelity dynamics modeling at the beginning and a gradual fidelity reduction as the planning progresses. To validate the effectiveness of this paradigm, we augment three different planners with DDP and show overall improved planning performance. We also develop a new DDP-based navigation system, which achieves second place in both the simulation phase and real-world phase of the 2025 BARN Challenge 2. Both simulated and physical experiments validate DDP’s hypothesized benefits.

ICRA Conference 2025 Conference Paper

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles

  • Rhys Howard
  • Nick Hawes
  • Lars Kunze

Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if one assumes agents act to maximise some concept of reward, it is difficult to make accurate causal inferences of agent planning without capturing what is of importance to the agent. Thus our work aims to learn a weighting of reward metrics for agents such that explanations for agent interactions can be causally inferred. We validate our approach quantitatively and qualitatively across three real-world driving datasets, demonstrating a functional improvement over previous methods and competitive performance across evaluation metrics.

NeurIPS Conference 2025 Conference Paper

Improving Regret Approximation for Unsupervised Dynamic Environment Generation

  • Harry Mead
  • Bruno Lacerda
  • Jakob Foerster
  • Nick Hawes

Unsupervised Environment Design (UED) seeks to automatically generate training curricula for reinforcement learning (RL) agents, with the goal of improving generalisation and zero-shot performance. However, designing effective curricula remains a difficult problem, particularly in settings where small subsets of environment parameterisations result in significant increases in the complexity of the required policy. Current methods struggle with a difficult credit assignment problem and rely on regret approximations that fail to identify challenging levels, both of which are compounded as the size of the environment grows. We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. We also introduce a new regret approximation, Maximised Negative Advantage (MNA), as a significantly improved metric to optimise for, that better identifies more challenging levels. We show empirically that MNA outperforms current regret approximations and when combined with DEGen, consistently outperforms existing methods, especially as the size of the environment grows. We have made all our code available here: \url{https: //github. com/HarryMJMead/Dynamic-Environment-Generation-for-UED}.

ICRA Conference 2025 Conference Paper

LUMOS: Language-Conditioned Imitation Learning with World Models

  • Iman Nematollahi
  • Branton DeMoss
  • Akshay L. Chandra
  • Nick Hawes
  • Wolfram Burgard
  • Ingmar Posner

We introduce LUMOS, a language-conditioned multi-task imitation learning framework for robotics. LUMOS learns skills by practicing them over many long-horizon rollouts in the latent space of a learned world model and transfers these skills zero-shot to a real robot. By learning on-policy in the latent space of the learned world model, our algorithm mitigates policy-induced distribution shift which most offline imitation learning methods suffer from. LUMOS learns from unstructured play data with fewer than 1 % hindsight language annotations but is steerable with language commands at test time. We achieve this coherent long-horizon performance by combining latent planning with both image-and language-based hindsight goal relabeling during training, and by optimizing an intrinsic reward defined in the latent space of the world model over multiple time steps, effectively reducing covariate shift. In experiments on the difficult long-horizon CALVIN benchmark, LUMOS outperforms prior learning-based methods with com-parable approaches on chained multi-task evaluations. To the best of our knowledge, we are the first to learn a language-conditioned continuous visuomotor control for a real-world robot within an offline world model. Videos, dataset and code are available at http://lumos.cs.uni-freiburg.de.

IROS Conference 2025 Conference Paper

Multi-Agent Pickup and Delivery with Mobile Pickups

  • Benedetta Flammini
  • Nick Hawes
  • Bruno Lacerda

In Multi-Agent Pickup and Delivery (MAPD), a team of agents must find collision-free paths to service an online stream of tasks, which are composed of pickup and delivery locations that have to be visited sequentially. This paper addresses the novel problem of MAPD with mobile pickups, which involves two types of agents, the suppliers and the deliverers. Suppliers are large robots that can transport many items, but cannot navigate tight spaces or manipulate objects, while deliverers can navigate to rooms to deliver items, but can only carry one item at a time. Deliverers have to collect items from the suppliers, and bring them to the assigned delivery locations. This introduces a new challenge which is not tackled in classical MAPD: deciding where and when the exchange of items should happen. We propose Token Passing with Exchange Locations (TP-EL), an extension of the widely used Token Passing (TP) algorithm with a task allocation mechanism that considers which supplier to pick items from, and when and where to do so. We experiment in several simulated domains, demonstrating the superiority of TP-EL over baselines that do not consider mobile pickups or use alternative methods to decide pickup locations.

ICML Conference 2025 Conference Paper

Return Capping: Sample Efficient CVaR Policy Gradient Optimisation

  • Harry Mead
  • Clarissa Costen
  • Bruno Lacerda
  • Nick Hawes

When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines. We have made all our code available here: https: //github. com/HarryMJMead/cvar-return-capping.

TAAS Journal 2024 Journal Article

A Framework for Simultaneous Task Allocation and Planning under Uncertainty

  • Fatma Faruq
  • Bruno Lacerda
  • Nick Hawes
  • David Parker

We present novel techniques for simultaneous task allocation and planning in multi-robot systems operating under uncertainty. By performing task allocation and planning simultaneously, allocations are informed by individual robot behaviour, creating more efficient team behaviour. We go beyond existing work by planning for task reallocation across the team given a model of partial task satisfaction under potential robot failures and uncertain action outcomes. We model the problem using Markov decision processes, with tasks encoded in co-safe linear temporal logic, and optimise for the expected number of tasks completed by the team. To avoid the inherent complexity of joint models, we propose an alternative model that simultaneously considers task allocation and planning, but in a sequential fashion. We then build a joint policy from the sequential policy obtained from our model, thus allowing for concurrent policy execution. Furthermore, to enable adaptation in the case of robot failures, we consider replanning from failure states and propose an approach to preemptively replan in an anytime fashion, replanning for more probable failure states first. Our method also allows us to quantify the performance of the team by providing an analysis of properties, such as the expected number of completed tasks under concurrent policy execution. We implement and extensively evaluate our approach on a range of scenarios. We compare its performance to a state-of-the-art baseline in decoupled task allocation and planning: sequential single-item auctions. Our approach outperforms the baseline in terms of computation time and the number of times replanning is required on robot failure.

ECAI Conference 2024 Conference Paper

Hierarchical Planning for Resource-Constrained Long-Term Monitoring Missions in Time-Varying Environments

  • Alex Stephens
  • Bruno Lacerda
  • Nick Hawes

We consider autonomous robots deployed on long-term monitoring missions in unknown environments. The planning objective is to maximise the value of observations obtained over the course of a mission, subject to resource constraints which demand periodic visits to depots where resources can be replenished. Effective planning in this setting requires reasoning over long horizons based on sparse observational data, and flexible management of the constrained resources. We present a hierarchical planning approach to this problem, using a spatiotemporal Gaussian process environment model at different levels of abstraction for short- and long-horizon planning. We empirically evaluate our approach on a series of synthetic domains, and a wildfire monitoring scenario based on real data.

AAMAS Conference 2024 Conference Paper

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

  • Alexander Rutherford
  • Benjamin Ellis
  • Matteo Gallici
  • Jonathan Cook
  • Andrei Lupu
  • Garðar Ingvarsson
  • Timon Willi
  • Akbir Khan

Benchmarks play an important role in the development of machine learning algorithms, with reinforcement learning (RL) research having been heavily influenced by the available environments. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-ofuse with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), N. Alechina, V. Dignum, M. Dastani, J. S. Sichman (eds.), May 6 – 10, 2024, Auckland, New Zealand. © 2024 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). This work is licenced under the Creative Commons Attribution 4. 0 International (CC-BY 4. 0) licence. show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the Star- Craft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https: //github. com/flairox/jaxmarl.

NeurIPS Conference 2024 Conference Paper

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

  • Alexander Rutherford
  • Benjamin Ellis
  • Matteo Gallici
  • Jonathan Cook
  • Andrei Lupu
  • Garðar Ingvarsson
  • Timon Willi
  • Ravi Hammond

Benchmarks are crucial in the development of machine learning algorithms, significantly influencing reinforcement learning (RL) research through the available environments. Traditionally, RL environments run on the CPU, which limits their scalability with the computational resources typically available in academia. However, recent advancements in JAX have enabled the wider use of hardware acceleration, enabling massively parallel RL training pipelines and environments. While this has been successfully applied to single-agent RL, it has not yet been widely adopted for multi-agent scenarios. In this paper, we present JaxMARL, the first open-source, easy-to-use code base that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments and popular baseline algorithms. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is up to 12, 500 times faster than existing approaches. This enables efficient and thorough evaluations, potentially alleviating the evaluation crisis in the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. The code is available at https: //github. com/flairox/jaxmarl.

AAMAS Conference 2024 Conference Paper

Multi-Robot Allocation of Assistance from a Shared Uncertain Operator

  • Clarissa Costen
  • Anna Gautier
  • Nick Hawes
  • Bruno Lacerda

Shared autonomy systems allow robots to either operate autonomously or request assistance from a human operator. In such settings, the human operator may exhibit sub-optimal behaviours, influenced by latent variables such as attention level or task proficiency. In this paper, we consider shared autonomy systems composed of multiple robots and one human. In this setting, we aim to synthesise a controller that selects, at each decision step, the actions to be taken by each robot and which (if any) robot the human operator should assist. To efficiently allocate the human operator to a robot at any given time, we propose a controller that reasons about the uncertainty over the latent variables impacting the human operator’s performance. To ensure scalability, we use an online bidding system, where each robot plans while considering its belief over the human’s performance, and bids according to the direct benefit of human assistance and how much information will be gained by the system about the human. We experiment on two domains, where we outperform approaches for allocation of human assistance that do not consider the human’s latent variables, and show that the performance of the overall system increases when robots consider the information gained by requesting human assistance when bidding.

NeurIPS Conference 2024 Conference Paper

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

  • Alex Rutherford
  • Michael Beukman
  • Timon Willi
  • Bruno Lacerda
  • Nick Hawes
  • Jakob Foerster

What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. Surprisingly, despite methods aiming to maximise regret in theory, the practical approximations do not correlate with regret but with success rate. As a result, a significant portion of an agent's experience comes from environments it has already mastered, offering little to no contribution toward enhancing its abilities. Put differently, current methods fail to predict intuitive measures of learnability. Specifically, they are unable to consistently identify those scenarios that the agent can sometimes solve, but not always. Based on our analysis, we develop a method that directly trains on scenarios with high learnability. This simple and intuitive approach outperforms existing UED methods in several binary-outcome environments, including the standard domain of Minigrid and a novel setting closely inspired by a real-world robotics problem. We further introduce a new adversarial evaluation procedure for directly measuring robustness, closely mirroring the conditional value at risk (CVaR). We open-source all our code and present visualisations of final policies here: https: //github. com/amacrutherford/sampling-for-learnability.

IROS Conference 2024 Conference Paper

Planning for Long-Term Monitoring Missions in Time-Varying Environments

  • Alex Stephens
  • Bruno Lacerda
  • Nick Hawes

Recent years have seen autonomous robots deployed in long-term missions across an ever-increasing breadth of domains. We consider robots deployed over a sequence of finite-horizon missions in the same environment, with the objective of maximising the value from observations of some unknown spatiotemporal process. This work is motivated by applications such as ecological monitoring, in which a robot might be repeatedly deployed in the field over weeks or months with the task of modelling processes of scientific interest. We formalise the problem of long-term monitoring over multiple finite-horizon missions as a Markov decision process with a partially unknown state, and present an online planning approach to address it. Our approach uses a spatiotemporal Gaussian process to model the environment and make predictions about unvisited states, integrating this with a belief-based Monte Carlo tree search algorithm which decides where the robot should go next. We demonstrate the strengths of our framework empirically through a series of experiments using synthetic data as well as real acoustic data from monitoring of bioactivity in coral reefs.

JAIR Journal 2024 Journal Article

Right Place, Right Time: Proactive Multi-Robot Task Allocation Under Spatiotemporal Uncertainty

  • Charlie Street
  • Bruno Lacerda
  • Manuel Mühlig
  • Nick Hawes

For many multi-robot problems, tasks are announced during execution, where task announcement times and locations are uncertain. To synthesise multi-robot behaviour that is robust to early announcements and unexpected delays, multi-robot task allocation methods must explicitly model the stochastic processes that govern task announcement. In this paper, we model task announcement using continuous-time Markov chains which predict when and where tasks will be announced. We then present a task allocation framework which uses the continuous-time Markov chains to allocate tasks proactively, such that robots are near or at the task location upon its announcement. Our method seeks to minimise the expected total waiting duration for each task, i.e. the duration between task announcement and a robot beginning to service the task. Our framework can be applied to any multi-robot task allocation problem where robots complete spatiotemporal tasks which are announced stochastically. We demonstrate the efficacy of our approach in simulation, where we outperform baselines which do not allocate tasks proactively, or do not fully exploit our task announcement models.

AAAI Conference 2024 Conference Paper

Stop! Planner Time: Metareasoning for Probabilistic Planning Using Learned Performance Profiles

  • Matthew Budd
  • Bruno Lacerda
  • Nick Hawes

The metareasoning framework aims to enable autonomous agents to factor in planning costs when making decisions. In this work, we develop the first non-myopic metareasoning algorithm for planning with Markov decision processes. Our method learns the behaviour of anytime probabilistic planning algorithms from performance data. Specifically, we propose a novel model for metareasoning, based on contextual performance profiles that predict the value of the planner's current solution given the time spent planning, the state of the planning algorithm's internal parameters, and the difficulty of the planning problem being solved. This model removes the need to assume that the current solution quality is always known, broadening the class of metareasoning problems that can be addressed. We then employ deep reinforcement learning to learn a policy that decides, at each timestep, whether to continue planning or start executing the current plan, and how to set hyperparameters of the planner to enhance its performance. We demonstrate our algorithm's ability to perform effective metareasoning in two domains.

NeurIPS Conference 2023 Conference Paper

Monte Carlo Tree Search with Boltzmann Exploration

  • Michael Painter
  • Mohamed Baioumy
  • Nick Hawes
  • Bruno Lacerda

Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.

AAAI Conference 2023 Conference Paper

Multi-Unit Auctions for Allocating Chance-Constrained Resources

  • Anna Gautier
  • Bruno Lacerda
  • Nick Hawes
  • Michael Wooldridge

Sharing scarce resources is a key challenge in multi-agent interaction, especially when individual agents are uncertain about their future consumption. We present a new auction mechanism for preallocating multi-unit resources among agents, while limiting the chance of resource violations. By planning for a chance constraint, we strike a balance between worst-case approaches, which under-utilise resources, and expected-case approaches, which lack formal guarantees. We also present an algorithm that allows agents to generate bids via multi-objective reasoning, which are then submitted to the auction. We then discuss how the auction can be extended to non-cooperative scenarios. Finally, we demonstrate empirically that our auction outperforms state-of-the-art techniques for chance-constrained multi-agent resource allocation in complex settings with up to hundreds of agents.

NeurIPS Conference 2023 Conference Paper

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is not feasible. In such domains, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-averse. An additional challenge of offline RL is avoiding distributional shift, i. e. ensuring that state-action pairs visited by the policy remain near those in the dataset. Previous offline RL algorithms that consider risk combine offline RL techniques (to avoid distributional shift), with risk-sensitive RL algorithms (to achieve risk-aversion). In this work, we propose risk-aversion as a mechanism to jointly address both of these issues. We propose a model-based approach, and use an ensemble of models to estimate epistemic uncertainty, in addition to aleatoric uncertainty. We train a policy that is risk-averse, and avoids high uncertainty actions. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that are risky due to environment stochasticity. Thus, by considering epistemic uncertainty via a model ensemble and introducing risk-aversion, our algorithm (1R2R) avoids distributional shift in addition to achieving risk-aversion to aleatoric risk. Our experiments show that 1R2R achieves strong performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.

AAAI Conference 2023 Conference Paper

Planning with Hidden Parameter Polynomial MDPs

  • Clarissa Costen
  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

For many applications of Markov Decision Processes (MDPs), the transition function cannot be specified exactly. Bayes-Adaptive MDPs (BAMDPs) extend MDPs to consider transition probabilities governed by latent parameters. To act optimally in BAMDPs, one must maintain a belief distribution over the latent parameters. Typically, this distribution is described by a set of sample (particle) MDPs, and associated weights which represent the likelihood of a sample MDP being the true underlying MDP. However, as the number of dimensions of the latent parameter space increases, the number of sample MDPs required to sufficiently represent the belief distribution grows exponentially. Thus, maintaining an accurate belief in the form of a set of sample MDPs over complex latent spaces is computationally intensive, which in turn affects the performance of planning for these models. In this paper, we propose an alternative approach for maintaining the belief over the latent parameters. We consider a class of BAMDPs where the transition probabilities can be expressed in closed form as a polynomial of the latent parameters, and outline a method to maintain a closed-form belief distribution for the latent parameters which results in an accurate belief representation. Furthermore, the closed-form representation does away with the need to tune the number of sample MDPs required to represent the belief. We evaluate two domains and empirically show that the polynomial, closed-form, belief representation results in better plans than a sampling-based belief representation.

ECAI Conference 2023 Conference Paper

Reinforcement Learning for Bandits with Continuous Actions and Large Context Spaces

  • Paul Duckworth
  • Katherine A. Vallis
  • Bruno Lacerda
  • Nick Hawes

We consider the challenging scenario of contextual bandits with continuous actions and large context spaces. This is an increasingly important application area in personalised healthcare where an agent is requested to make dosing decisions based on a patient’s single image scan. In this paper, we first adapt a reinforcement learning (RL) algorithm for continuous control to outperform contextual bandit algorithms specifically hand-crafted for continuous action spaces. We empirically demonstrate this on a suite of standard benchmark datasets for vector contexts. Secondly, we demonstrate that our RL agent can generalise problems with continuous actions to large context spaces, providing results that outperform previous methods on image contexts. Thirdly, we introduce a new contextual bandits test domain with multi-dimensional continuous action space and image contexts which existing tree-based methods cannot handle. We provide initial results with our RL agent.

AAMAS Conference 2023 Conference Paper

Risk-Constrained Planning for Multi-Agent Systems with Shared Resources

  • Anna Gautier
  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes
  • Michael Wooldridge

Planning under uncertainty requires complex reasoning about future events, and this complexity increases with the addition of multiple agents. One problem faced when considering multi-agent systems under uncertainty is the handling of shared resources. Adding a resource constraint limits the actions that agents can take, forcing collaborative decision making on who gets to use what resources. Prior work has considered different formulations, such as satisfying a resource constraint in expectation or ensuring that a resource constraint is met some percent of the time. However, these formulations of constrained planning ignore important distributional information about resource usage. Namely, they do not consider how bad the worst cases can get. In this paper, we formulate a risk-constrained shared resource problem and aim to limit the risk of excessive use of such resources. We focus on optimising for reward while constraining the Conditional Value-at-Risk (CVaR) of the shared resource. While CVaR is well studied in the single-agent setting, we consider the challenges that arise from the state and action space explosion in the multi-agent setting. In particular, we exploit risk contributions, a measure introduced in finance research which quantifies how much individual agents affect the joint risk. We present an algorithm that uses risk contributions to iteratively update single-agent policies until the joint risk constraint is satisfied. We evaluate our algorithm on two synthetic domains.

ICRA Conference 2023 Conference Paper

VP-STO: Via-point-based Stochastic Trajectory Optimization for Reactive Robot Behavior

  • Julius Jankowski
  • Lara Brudermüller
  • Nick Hawes
  • Sylvain Calinon

Achieving reactive robot behavior in complex dynamic environments is still challenging as it relies on being able to solve trajectory optimization problems quickly enough, such that we can replan the future motion at frequencies which are sufficiently high for the task at hand. We argue that current limitations in Model Predictive Control (MPC) for robot manipulators arise from inefficient, high-dimensional trajectory representations and the negligence of time-optimality in the trajectory optimization process. Therefore, we propose a motion optimization framework that optimizes jointly over space and time, generating smooth and timing-optimal robot trajectories in joint-space. While being task-agnostic, our formulation can incorporate additional task-specific requirements, such as collision avoidance, and yet maintain real-time control rates, demonstrated in simulation and real-world robot experiments on closed-loop manipulation. For additional material, please visit https://sites.google.com/oxfordrobotics.institute/vp-sto.

AAMAS Conference 2022 Conference Paper

Context-Aware Modelling for Multi-Robot Systems Under Uncertainty

  • Charlie Street
  • Bruno Lacerda
  • Michal Staniaszek
  • Manuel Mühlig
  • Nick Hawes

Formal models of multi-robot behaviour are fundamental to planning, simulation, and model checking techniques. However, existing models are invalidated by strong assumptions that fail to capture execution-time multi-robot behaviour, such as simplistic duration models or synchronisation constraints. In this paper we propose a novel multi-robot Markov automaton formulation which models asynchronous multi-robot execution in continuous time. Robot dynamics are captured using phase-type distributions over action durations. Moreover, we explicitly model the effects of robot interactions, as they are a key factor for the duration of action execution. We also present a scalable discrete-event simulator which yields realistic statistics over execution-time robot behaviour by sampling through the Markov automaton. We validate our model and simulator against a Gazebo simulation in a range of multi-robot navigation scenarios, demonstrating that our model accurately captures highlevel multi-robot behaviour.

AAMAS Conference 2022 Conference Paper

Negotiated Path Planning for Non-Cooperative Multi-Robot Systems

  • Anna Gautier
  • Alex Stephens
  • Bruno Lacerda
  • Nick Hawes
  • Michael Wooldridge

As autonomous systems are deployed at a large scale in both public and private spaces, robots owned and operated by competing organisations will be required to interact. Interactions in such settings will be inherently non-cooperative. In this paper, we address the problem of non-cooperative multi-agent path finding. We design an auction mechanism that allows a group of agents to reach their goals whilst minimising the total cost of the system. In particular, we aim to design a mechanism such that rational agents are incentivised to participate. Our privileged knowledge auction consists of a modified combinatorial Vickrey-Clarke-Groves auction. Our approach limits the initial number of bids in the Vickrey-Clarke-Groves auction, then uses the privileged knowledge of the auctioneer to identify and solve path conflicts. In order to maintain agent autonomy in the non-cooperative system, individual agents are provided with final say over paths. The mechanism provides a heuristic method to maximise social welfare whilst remaining computationally efficient. We also consider single-agent bid generation and propose a similarity metric to use in dissimilar shortest path generation. We then show this bid generation method increases the success likelihood of both the limited-bid VCG auction and our novel approach on synthetic data. Our experiments with synthetic data outperform existing work on the non-cooperative problem.

ICAPS Conference 2022 Conference Paper

Planning for Risk-Aversion and Expected Value in MDPs

  • Marc Rigter
  • Paul Duckworth
  • Bruno Lacerda
  • Nick Hawes

Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a riskaverse objective such as conditional value at risk (CVaR). However, optimising the CVaR alone may result in poor performance in expectation. In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. This motivates us to propose a lexicographic approach which minimises the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on four domains. Our results demonstrate that our lexicographic approach improves the expected cost compared to the state of the art algorithm, while achieving the optimal CVaR.

IROS Conference 2022 Conference Paper

Probabilistic Planning for AUV Data Harvesting from Smart Underwater Sensor Networks

  • Matthew Budd
  • Georgios Salavasidis
  • Izzat Karnarudzaman
  • Catherine A. Harris
  • Alexander B. Phillips
  • Paul Duckworth
  • Nick Hawes
  • Bruno Lacerda

Harvesting valuable ocean data, ranging from climate and marine life analysis to industrial equipment monitoring, is an extremely challenging real-world problem. Sparse underwater sensor networks are a promising approach to scale to larger and deeper environments, but these have difficulty offloading their data without external assistance. Traditionally, offloading data has been achieved by costly, fixed communication infrastructure. In this paper, we propose a planning under uncertainty method that enables an autonomous underwater vehicle (AUV) to adaptively collect data from smart sensor networks in underwater environments. Our novel solution exploits the ability of sensor nodes to provide the AUV with time-of-flight acoustic localisation, and is able to prioritise nodes with the most valuable data. In both simulated experiments and a real-world field trial, we demonstrate that our method outperforms the type of hand-designed behaviours that has previously been used in the context of underwater data harvesting.

NeurIPS Conference 2022 Conference Paper

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. We formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained to minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and adversarially optimising the model. The problem formulation that we address is theoretically grounded, resulting in a probably approximately correct (PAC) performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that it outperforms existing state-of-the-art baselines.

IJCAI Conference 2022 Conference Paper

Shared Autonomy Systems with Stochastic Operator Models

  • Clarissa Costen
  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

We consider shared autonomy systems where multiple operators (AI and human), can interact with the environment, e. g. by controlling a robot. The decision problem for the shared autonomy system is to select which operator takes control at each timestep, such that a reward specifying the intended system behaviour is maximised. The performance of the human operator is influenced by unobserved factors, such as fatigue or skill level. Therefore, the system must reason over stochastic models of operator performance. We present a framework for stochastic operators in shared autonomy systems (SO-SAS), where we represent operators using rich, partially observable models. We formalise SO-SAS as a mixed-observability Markov decision process, where environment states are fully observable and internal operator states are hidden. We test SO-SAS on a simulated domain and a computer game, empirically showing it results in better performance compared to traditional formulations of shared autonomy systems.

IROS Conference 2022 Conference Paper

Unbiased Active Inference for Classical Control

  • Mohamed Baioumy
  • Corrado Pezzato
  • Riccardo M. G. Ferrari
  • Nick Hawes

Active inference is a mathematical framework that originated in computational neuroscience. Recently, it has been demonstrated as a promising approach for constructing goal-driven behavior in robotics. Specifically, the active inference controller (AIC) has been successful on several continuous control and state-estimation tasks. Despite its relative success, some established design choices lead to a number of practical limitations for robot control. These include having a biased estimate of the state, and only an implicit model of control actions. In this paper, we highlight these limitations and propose an extended version of the unbiased active inference controller (u-AIC). The u-AIC maintains all the compelling benefits of the AIC and removes its limitations. Simulation results on a 2-DOF arm and experiments on a real 7-DOF manipulator show the improved performance of the u-AIC with respect to the standard AIC. The code can be found at https://github.com/cpezzato/unbiasedaic.

ICRA Conference 2021 Conference Paper

Active Inference for Integrated State-Estimation, Control, and Learning

  • Mohamed Baioumy
  • Paul Duckworth
  • Bruno Lacerda
  • Nick Hawes

This work presents an approach for control, state-estimation and learning model (hyper)parameters for robotic manipulators. It is based on the active inference framework, prominent in computational neuroscience as a theory of the brain, where behaviour arises from minimizing variational free-energy. First, we show there is a direct relationship between active inference controllers, and classic methods such as PID control. We demonstrate its application for adaptive and robust behaviour of a robotic manipulator that rivals state-of-the-art. Additionally, we show that by learning specific hyperparameters, our approach can deal with unmodeled dynamics, damps oscillations, and is robust against poor initial parameters. The approach is validated on the ‘Franka Emika Panda’ 7 DoF manipulator. Finally, we highlight limitations of active inference controllers for robotic systems.

AAAI Conference 2021 Conference Paper

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

The parameters for a Markov Decision Process (MDP) often cannot be specified exactly. Uncertain MDPs (UMDPs) capture this model ambiguity by defining sets which the parameters belong to. Minimax regret has been proposed as an objective for planning in UMDPs to find robust policies which are not overly conservative. In this work, we focus on planning for Stochastic Shortest Path (SSP) UMDPs with uncertain cost and transition functions. We introduce a Bellman equation to compute the regret for a policy. We propose a dynamic programming algorithm that utilises the regret Bellman equation, and show that it optimises minimax regret exactly for UMDPs with independent uncertainties. For coupled uncertainties, we extend our approach to use options to enable a trade off between computation and solution quality. We evaluate our approach on both synthetic and real-world domains, showing that it significantly outperforms existing baselines.

NeurIPS Conference 2021 Conference Paper

Risk-Averse Bayes-Adaptive Reinforcement Learning

  • Marc Rigter
  • Bruno Lacerda
  • Nick Hawes

In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the epistemic uncertainty due to the prior distribution over MDPs, and the aleatoric uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.

ICAPS Conference 2020 Conference Paper

Convex Hull Monte-Carlo Tree-Search

  • Michael Painter
  • Bruno Lacerda
  • Nick Hawes

This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multi-objective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.

IROS Conference 2020 Conference Paper

Long-Run Multi-Robot Planning under Uncertain Action Durations for Persistent Tasks

  • Carlos Azevedo
  • Bruno Lacerda
  • Nick Hawes
  • Pedro U. Lima

This paper presents an approach for multi-robot long-term planning under uncertainty over the duration of actions. The proposed methodology takes advantage of generalized stochastic Petri nets with rewards (GSPNR) to model multi-robot problems. A GSPNR allows for unified modeling of action selection, uncertainty on the duration of action execution, and for goal specification through the use of transition rewards and rewards per time unit. Our approach relies on the interpretation of the GSPNR model as an equivalent embedded Markov reward automaton (MRA). We then build on a state-of-the-art method to compute the long-run average reward over MRAs, extending it to enable the extraction of the optimal policy. We provide an empirical evaluation of the proposed approach on a simulated multi-robot monitoring problem, evaluating its performance and scalability. The results show that the synthesized policy outperforms a policy obtained from an infinite horizon discounted reward formulation as well as a carefully hand-crafted policy.

IROS Conference 2020 Conference Paper

Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes

  • Matthew Budd
  • Bruno Lacerda
  • Paul Duckworth
  • Andrew West
  • Barry Lennox
  • Nick Hawes

When exploring an unknown environment, a mobile robot must decide where to observe next. It must do this whilst minimising the risk of failure, by only exploring areas that it expects to be safe. In this context, safety refers to the robot remaining in regions where critical environment features (e. g. terrain steepness, radiation levels) are within ranges the robot is able to tolerate. More specifically, we consider a setting where a robot explores an environment modelled with a Markov decision process, subject to bounds on the values of one or more environment features which can only be sensed at runtime. We use a Gaussian process to predict the value of the environment feature in unvisited regions, and propose an estimated Markov decision process, a model that integrates the Gaussian process predictions with the environment model transition probabilities. Building on this model, we propose an exploration algorithm that, contrary to previous approaches, considers probabilistic transitions and explicitly reasons about the uncertainty over the Gaussian process predictions. Furthermore, our approach increases the speed of exploration by selecting locations to visit further away from the currently explored area. We evaluate our approach on a real-world gamma radiation dataset, tackling the challenge of a nuclear material inspection robot exploring an a priori unknown area.

IJCAI Conference 2019 Conference Paper

Multi-Robot Planning Under Uncertain Travel Times and Safety Constraints

  • Masoumeh Mansouri
  • Bruno Lacerda
  • Nick Hawes
  • Federico Pecora

We present a novel modelling and planning approach for multi-robot systems under uncertain travel times. The approach uses generalised stochastic Petri nets (GSPNs) to model desired team behaviour, and allows to specify safety constraints and rewards. The GSPN is interpreted as a Markov decision process (MDP) for which we can generate policies that optimise the requirements. This representation is more compact than the equivalent multi-agent MDP, allowing us to scale better. Furthermore, it naturally allows for asynchronous execution of the generated policies across the robots, yielding smoother team behaviour. We also describe how the integration of the GSPN with a lower-level team controller allows for accurate expectations on team performance. We evaluate our approach on an industrial scenario, showing that it outperforms hand-crafted policies used in current practice.

IROS Conference 2018 Conference Paper

Simultaneous Task Allocation and Planning Under Uncertainty

  • Fatma Faruq
  • David Parker 0001
  • Bruno Lacerda
  • Nick Hawes

We propose novel techniques for task allocation and planning in multi-robot systems operating in uncertain environments. Task allocation is performed simultaneously with planning, which provides more detailed information about individual robot behaviour, but also exploits independence between tasks to do so efficiently. We use Markov decision processes to model robot behaviour and linear temporal logic to specify tasks and safety constraints. Building upon techniques and tools from formal verification, we show how to generate a sequence of multi-robot policies, iteratively refining them to reallocate tasks if individual robots fail, and providing probabilistic guarantees on the performance (and safe operation) of the team of robots under the resulting policy. We implement our approach and evaluate it on a benchmark multi-robot example.

IROS Conference 2017 Conference Paper

Learning deep visual object models from noisy web data: How to make it work

  • Nizar Massouh
  • Francesca Babiloni
  • Tatiana Tommasi
  • Jay Young
  • Nick Hawes
  • Barbara Caputo

Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.

ICAPS Conference 2017 Conference Paper

Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees

  • Bruno Lacerda
  • David Parker 0001
  • Nick Hawes

We present a methodology for the generation of mobile robot controllers which offer probabilistic time-bounded guarantees on successful task completion, whilst also trying to satisfy soft goals. The approach is based on a stochastic model of the robot’s environment and action execution times, a set of soft goals, and a formal task specification in co-safe linear temporal logic, which are analysed using multi-objective model checking techniques for Markov decision processes. For efficiency, we propose a novel two-step approach. First, we explore policies on the Pareto front for minimising expected task execution time whilst optimising the achievement of soft goals. Then, we use this to prune a model with more detailed timing information, yielding a time-dependent policy for which more fine-grained probabilistic guarantees can be provided. We illustrate and evaluate the generation of policies on a delivery task in a care home scenario, where the robot also tries to engage in entertainment activities with the patients.

ICRA Conference 2017 Conference Paper

Semantic web-mining and deep vision for lifelong object discovery

  • Jay Young
  • Lars Kunze
  • Valerio Basile
  • Elena Cabrio
  • Nick Hawes
  • Barbara Caputo

Autonomous robots that are to assist humans in their daily lives must recognize and understand the meaning of objects in their environment. However, the open nature of the world means robots must be able to learn and extend their knowledge about previously unknown objects on-line. In this work we investigate the problem of unknown object hypotheses generation, and employ a semantic Web-mining framework along with deep-learning-based object detectors. This allows us to make use of both visual and semantic features in combined hypotheses generation. Experiments on data from mobile robots in real world application deployments show that this combination improves performance over the use of either method in isolation.

IROS Conference 2016 Conference Paper

A Poisson-spectral model for modelling temporal patterns in human data observed by a robot

  • Ferdian Jovan
  • Jeremy L. Wyatt
  • Nick Hawes
  • Tomás Krajník

The efficiency of autonomous robots depends on how well they understand their operating environment. While most of the traditional environment models focus on the spatial representation, long-term mobile robot operation in human populated environments requires that the robots have a basic model of human behaviour.

IROS Conference 2016 Conference Paper

Experimental analysis of a variable autonomy framework for controlling a remotely operating mobile robot

  • Manolis Chiou
  • Rustam Stolkin
  • Goda Bieksaite
  • Nick Hawes
  • Kimron L. Shapiro
  • Timothy S. Harrison

This paper presents a principled experimental analysis of a variable autonomy control approach to mobile robot navigation. A Human-Initiative (HI) variable autonomy system is investigated, in which a human operator is able to switch the Level of Autonomy (LOA) between teleoperation (joystick control) and autonomous control (robot navigates autonomously towards waypoints selected by the human) on-the-fly. Our hypothesis is that the HI system will enable superior navigation performance compared to either teleoperation or autonomy alone, especially in scenarios where the performance of both the human and the robot may at times become degraded. We evaluate our hypothesis through carefully controlled and repeatable experiments using a significant number of human test-subjects.

ECAI Conference 2016 Conference Paper

Partial Order Temporal Plan Merging for Mobile Robot Tasks

  • Lenka Mudrová
  • Bruno Lacerda
  • Nick Hawes

For many mobile service robot applications, planning problems are based on deciding how and when to navigate to certain locations and execute certain tasks. Typically, many of these tasks are independent from one another, and the main objective is to obtain plans that efficiently take into account where these tasks can be executed and when execution is allowed. In this paper, we present an approach, based on merging of partial order plans with durative actions, that can quickly and effectively generate a plan for a set of independent goals. This plan exploits some of the synergies of the plans for each single task, such as common locations where certain actions should be executed. We evaluate our approach in benchmarking domains, comparing it with state-of-the-art planners and showing how it provides a good trade-off between the approach of sequencing the plans for each task (which is fast but produces poor results), and the approach of planning for a conjunction of all the goals (which is slow but produces good results).

ECAI Conference 2016 Conference Paper

Towards Lifelong Object Learning by Integrating Situated Robot Perception and Semantic Web Mining

  • Jay Young
  • Valerio Basile
  • Lars Kunze
  • Elena Cabrio
  • Nick Hawes

Autonomous robots that are to assist humans in their daily lives are required, among other things, to recognize and understand the meaning of task-related objects. However, given an open-ended set of tasks, the set of everyday objects that robots will encounter during their lifetime is not foreseeable. That is, robots have to learn and extend their knowledge about previously unknown objects on-the-job. Our approach automatically acquires parts of this knowledge (e. g. , the class of an object and its typical location) in form of ranked hypotheses from the Semantic Web using contextual information extracted from observations and experiences made by robots. Thus, by integrating situated robot perception and Semantic Web mining, robots can continuously extend their object knowledge beyond perceptual models which allows them to reason about task-related objects, e. g. , when searching for them, robots can infer the most likely object locations. An evaluation of the integrated system on long-term data from real office observations, demonstrates that generated hypotheses can effectively constrain the meaning of objects. Hence, we believe that the proposed system can be an essential component in a lifelong learning framework which acquires knowledge about objects from real world observations.

AAAI Conference 2015 Conference Paper

A Comparison of Qualitative and Metric Spatial Relation Models for Scene Understanding

  • Akshaya Thippur
  • Chris Burbridge
  • Lars Kunze
  • Marina Alberti
  • John Folkesson
  • Patric Jensfelt
  • Nick Hawes

Object recognition systems can be unreliable when run in isolation depending on only image based features, but their performance can be improved when taking scene context into account. In this paper, we present techniques to model and infer object labels in real scenes based on a variety of spatial relations – geometric features which capture how objects co-occur – and compare their efficacy in the context of augmenting perception based object classification in real-world table-top scenes. We utilise a long-term dataset of office tabletops for qualitatively comparing the performances of these techniques. On this dataset, we show that more intricate techniques, have a superior performance but do not generalise well on small training data. We also show that techniques using coarser information perform crudely but sufficiently well in standalone scenarios and generalise well on small training data. We conclude the paper, expanding on the insights we have gained through these comparisons and comment on a few fundamental topics with respect to long-term autonomous robots.

ICRA Conference 2015 Conference Paper

Now or later? Predicting and maximising success of navigation actions from long-term experience

  • Jaime Pulido Fentanes
  • Bruno Lacerda
  • Tomás Krajník
  • Nick Hawes
  • Marc Hanheide

In planning for deliberation or navigation in real-world robotic systems, one of the big challenges is to cope with change. It lies in the nature of planning that it has to make assumptions about the future state of the world, and the robot's chances of successively accomplishing actions in this future. Hence, a robot's plan can only be as good as its predictions about the world. In this paper, we present a novel approach to specifically represent changes that stem from periodic events in the environment (e. g. a door being opened or closed), which impact on the success probability of planned actions. We show that our approach to model the probability of action success as a set of superimposed periodic processes allows the robot to predict action outcomes in a long-term data obtained in two real-life offices better than a static model. We furthermore discuss and showcase how this knowledge gathered can be successfully employed in a probabilistic planning framework to devise better navigation plans. The key contributions of this paper are (i) the formation of the spectral model of action outcomes from non-uniform sampling, the (ii) analysis of its predictive power using two long-term datasets, and (iii) the application of the predicted outcomes in an MDP-based planning framework.

IJCAI Conference 2015 Conference Paper

Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications

  • Bruno Lacerda
  • David Parker
  • Nick Hawes

We present a method to calculate cost-optimal policies for task specifications in co-safe linear temporal logic over a Markov decision process model of a stochastic system. Our key contribution is to address scenarios in which the task may not be achievable with probability one. We formalise a task progression metric and, using multi-objective probabilistic model checking, generate policies that are formally guaranteed to, in decreasing order of priority: maximise the probability of finishing the task; maximise progress towards completion, if this is not possible; and minimise the expected time or cost required. We illustrate and evaluate our approach in a robot task planning scenario, where the task is to visit a set of rooms that may be inaccessible during execution.

ICRA Conference 2015 Conference Paper

Task scheduling for mobile robots using interval algebra

  • Lenka Mudrová
  • Nick Hawes

We present a novel task scheduling algorithm for use on mobile robots in real environments. The scheduling problem is formalised as mixed integer program, which is a standard approach in the scheduling community. Our contribution is the use of Allen's interval algebra to prune the search to be performed by the mixed integer program. This significantly speeds up the algorithm. The proposed algorithm has been used on several mobile robots in long-term autonomy scenarios, where it schedules large sets containing a variety of tasks. The proposed algorithm outperforms the state of the art by at least one order of magnitude on both these real tasks and synthetic datasets.

IROS Conference 2014 Conference Paper

Combining top-down spatial reasoning and bottom-up object class recognition for scene understanding

  • Lars Kunze
  • Chris Burbridge
  • Marina Alberti
  • Akshaya Thippur
  • John Folkesson
  • Patric Jensfelt
  • Nick Hawes

Many robot perception systems are built to only consider intrinsic object features to recognise the class of an object. By integrating both top-down spatial relational reasoning and bottom-up object class recognition the overall performance of a perception system can be improved. In this paper we present a unified framework that combines a 3D object class recognition system with learned, spatial models of object relations. In robot experiments we show that our combined approach improves the classification results on real world office desks compared to pure bottom-up perception. Hence, by using spatial knowledge during object class recognition perception becomes more efficient and robust and robots can understand scenes more effectively.

IROS Conference 2014 Conference Paper

Optimal and dynamic planning for Markov decision processes with co-safe LTL specifications

  • Bruno Lacerda
  • David Parker 0001
  • Nick Hawes

We present a method to specify tasks and synthesise cost-optimal policies for Markov decision processes using co-safe linear temporal logic. Our approach incorporates a dynamic task handling procedure which allows for the addition of new tasks during execution and provides the ability to re-plan an optimal policy on-the-fly. This new policy minimises the cost to satisfy the conjunction of the current tasks and the new one, taking into account how much of the current tasks has already been executed. We illustrate our approach by applying it to motion planning for a mobile service robot.

ICRA Conference 2014 Conference Paper

Using Qualitative Spatial Relations for indirect object search

  • Lars Kunze
  • Keerthi Kumar Doreswamy
  • Nick Hawes

Finding objects in human environments requires autonomous mobile robots to reason about potential object locations and to plan to perceive them accordingly. By using information about the 3D structure of the environment, knowledge about landmark objects and their spatial relationship to the sought object, search can be improved by directing the robot towards the most likely object locations. In this paper we have designed, implemented and evaluated an approach for searching for objects on the basis of Qualitative Spatial Relations (QSRs) such as left-of and in-front-of. On the basis of QSRs between landmarks and the sought object we generate metric poses of potential object locations using an extended version of the ternary point calculus and employ this information for view planning. Preliminary results show that search methods based on QSRs are faster and more reliable than methods not considering them.

AAAI Conference 2012 Conference Paper

Towards a Cognitive System that Can Recognize Spatial Regions Based on Context

  • Nick Hawes
  • Matthew Klenk
  • Kate Lockwood
  • Graham Horn
  • John Kelleher

In order to collaborate with people in the real world, cognitive systems must be able to represent and reason about spatial regions in human environments. Consider the command “go to the front of the classroom”. The spatial region mentioned (the front of the classroom) is not perceivable using geometry alone. Instead it is defined by its functional use, implied by nearby objects and their configuration. In this paper, we define such areas as context-dependent spatial regions and present a cognitive system able to learn them by combining qualitative spatial representations, semantic labels, and analogy. The system is capable of generating a collection of qualitative spatial representations describing the configuration of the entities it perceives in the world. It can then be taught context-dependent spatial regions using anchor points defined on these representations. From this we then demonstrate how an existing computational model of analogy can be used to detect context-dependent spatial regions in previously unseen rooms. To evaluate this process we compare detected regions to annotations made on maps of real rooms by human volunteers.

IROS Conference 2011 Conference Paper

A system for interactive learning in dialogue with a tutor

  • Danijel Skocaj
  • Matej Kristan
  • Alen Vrecko
  • Marko Mahnic
  • Miroslav Janícek
  • Geert-Jan M. Kruijff
  • Marc Hanheide
  • Nick Hawes

In this paper we present representations and mechanisms that facilitate continuous learning of visual concepts in dialogue with a tutor and show the implemented robot system. We present how beliefs about the world are created by processing visual and linguistic information and show how they are used for planning system behaviour with the aim at satisfying its internal drive - to extend its knowledge. The system facilitates different kinds of learning initiated by the human tutor or by the system itself. We demonstrate these principles in the case of learning about object colours and basic shapes.

IJCAI Conference 2011 Conference Paper

Exploiting Probabilistic Knowledge under Uncertain Sensing for Efficient Robot Behaviour

  • Marc Hanheide
  • Charles Gretton
  • R. Dearden
  • Nick Hawes
  • Jeremy Wyatt
  • Andrzej Pronobis
  • Alper Aydemir
  • Moritz G

Robots must perform tasks efficiently and reliably while acting underuncertainty. One way to achieve efficiency is to give the robot common-sense knowledge about the structure of the world. Reliable robot behaviour can be achieved by modelling the uncertaintyin the world probabilistically. We present a robot system that combines these two approaches and demonstrate the improvements in efficiency and reliability that result. Our first contribution is a probabilistic relational model integrating common-sense knowledge about the world in general, with observations of a particular environment. Our second contribution is a continual planning system which is able to plan in the large problems posed by that model, by automatically switching between decision-theoretic and classical procedures. We evaluate our system on object search tasks in two different real-world indoor environments. By reasoning about the trade-offs between possible courses of action with different informational effects, and exploiting the cues and general structures of those environments, our robot is able to consistently demonstrate efficient and reliable goal-directed behaviour.

ICRA Conference 2011 Conference Paper

Home alone: Autonomous extension and correction of spatial representations

  • Nick Hawes
  • Marc Hanheide
  • Jack Hargreaves
  • Ben Page
  • Hendrik Zender
  • Patric Jensfelt

In this paper we present an account of the problems faced by a mobile robot given an incomplete tour of an unknown environment, and introduce a collection of techniques which can generate successful behaviour even in the presence of such problems. Underlying our approach is the principle that an autonomous system must be motivated to act to gather new knowledge, and to validate and correct existing knowledge. This principle is embodied in Dora, a mobile robot which features the aforementioned techniques: shared representations, non-monotonic reasoning, and goal generation and management. To demonstrate how well this collection of techniques work in real-world situations we present a comprehensive analysis of the Dora system's performance over multiple tours in an in door environment. In this analysis Dora successfully completed 18 of 21 attempted runs, with all but 3 of these successes requiring one or more of the integrated techniques to recover from problems.

AAMAS Conference 2010 Conference Paper

Dora The Explorer: A Motivated Robot

  • Nick Hawes
  • Marc Hanheide
  • Kristoffer Sj
  • ouml;
  • Alper Ayedemir
  • Patric Jensfelt
  • Moritz G
  • ouml; belbecker

Dora the Explorer is a mobile robot with a sense of curiosity and a drive to explore its world. Given an incompletetour of an indoor environment, Dora is driven by internalmotivations to probe the gaps in her spatial knowledge. Sheactively explores regions of space which she hasn't previouslyvisited but which she expects will lead her to further unexplored space. She will also attempt to determine the categories of rooms through active visual search for functionallyimportant objects, and through ontology-driven inference onthe results of this search.

IROS Conference 2009 Conference Paper

A computer vision integration model for a multi-modal cognitive system

  • Alen Vrecko
  • Danijel Skocaj
  • Nick Hawes
  • Ales Leonardis

We present a general method for integrating visual components into a multi-modal cognitive system. The integration is very generic and can work with an arbitrary set of modalities. We illustrate our integration approach with a specific instantiation of the architecture schema that focuses on integration of vision and language: a cognitive system able to collaborate with a human, learn and display some understanding of its surroundings. As examples of cross-modal interaction we describe mechanisms for clarification and visual learning.

IJCAI Conference 2007 Conference Paper

  • Michael Brenner
  • Nick Hawes
  • John Kelleher
  • Jeremy Wyatt

In human-robot interaction (HRI) it is essential that the robot interprets and reacts to a human's utterances in a manner that reflects their intended meaning. In this paper we present a collection of novel techniques that allow a robot to interpret and execute spoken commands describing manipulation goals involving qualitative spatial constraints (e. g. "put the red ball near the blue cube"). The resulting implemented system integrates computer vision, potential field models of spatial relationships, and action planning to mediate between the continuous real world, and discrete, qualitative representations used for symbolic reasoning.

AAAI Conference 2007 Conference Paper

Towards an Integrated Robot with Multiple Cognitive Functions

  • Nick Hawes
  • Jeremy Wyatt
  • Henrik Jacobsson
  • Michael Brenner

We present integration mechanisms for combining heterogeneous components in a situated information processing system, illustrated by a cognitive robot able to collaborate with a human and display some understanding of its surroundings. These mechanisms include an architectural schema that encourages parallel and incremental information processing, and a method for binding information from distinct representations that when faced with rapid change in the world can maintain a coherent, though distributed, view of it. Provisional results are demonstrated in a robot combining vision, manipulation, language, planning and reasoning capabilities interacting with a human and manipulable objects.