Arrow Research search

Author name cluster

Alessandro Farinelli

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

91 papers
2 author rows

Possible papers

91

AAAI Conference 2026 Conference Paper

On the Probabilistic Learnability of Compact Neural Network Preimage Bounds

  • Luca Marzari
  • Manuele Bicego
  • Ferdinando Cicalese
  • Alessandro Farinelli

Although recent provable methods have been developed to compute preimage bounds for neural networks, their scalability is fundamentally limited by the #P-hardness of the problem. In this work, we adopt a novel probabilistic perspective, aiming to deliver solutions with high-confidence guarantees and bounded error. To this end, we investigate the potential of bootstrap-based and randomized approaches that are capable of capturing complex patterns in high-dimensional spaces, including input regions where a given output property holds. In detail, we introduce Random Forest Property Verifier (RF-ProVe), a method that exploits an ensemble of randomized decision trees to generate candidate input regions satisfying a desired output property and refines them through active resampling. Our theoretical derivations offer formal statistical guarantees on region purity and global coverage, providing a practical, scalable solution for computing compact preimage approximations in cases where exact solvers fail to scale.

ECAI Conference 2025 Conference Paper

Advancing Neural Network Verification Through Hierarchical Safety Abstract Interpretation

  • Luca Marzari
  • Isabella Mastroeni
  • Alessandro Farinelli

Traditional methods for formal verification (FV) of deep neural networks (DNNs) are constrained by a binary encoding of safety properties, where a model is classified as either safe or unsafe (robust or not robust). This binary encoding fails to capture the nuanced safety levels within a model, often resulting in either overly restrictive or too permissive requirements. In this paper, we introduce a novel problem formulation called ABSTRACT DNN-VERIFICATION, which verifies a hierarchical structure of unsafe outputs, providing a more granular analysis of the safety aspect for a given DNN. Crucially, by leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the FV process, requiring the same (in the worst case) or even potentially less computational effort than the traditional binary verification approach. Specifically, we demonstrate how this formulation allows rank adversarial inputs according to their abstract safety level violation, offering a more detailed evaluation of the model’s safety and robustness. Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches that employ abstract interpretation for robustness verification, complexity analysis of the novel problem introduced, and an empirical evaluation considering both a complex deep reinforcement learning task (based on Habitat 3. 0) and standard DNN-Verification benchmarks.

ECAI Conference 2025 Conference Paper

An Approximate Embedding for Designing Ethical Reinforcement Learning Environments

  • Arnau Mayoral-Macau
  • Manel Rodriguez-Soto
  • Enrico Marchesini
  • Martí Sánchez-Fibla
  • Maite López-Sánchez
  • Juan A. Rodríguez-Aguilar
  • Alessandro Farinelli

This paper introduces the Approximate Ethical Embedding Process, an algorithm for automating the design of ethical environments for learning agents. Our algorithm helps build environments wherein multiple agents learn policies that align with an ethical (moral) value while simultaneously pursuing their individual objectives. Therefore, we contribute to endowing environment designers with algorithmic tools for building ethical environments. We demonstrate the ethical design process for two different settings of an environment where agents have to adhere to beneficence to promote the collective survival of the population. Our experiments show that our approximate embedding process successfully generates environments that incentivise the learning of value-aligned policies.

AAAI Conference 2025 Conference Paper

Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach

  • Daniele Meli
  • Alberto Castellini
  • Alessandro Farinelli

Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).

NeSy Conference 2025 Conference Paper

Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time

  • Celeste Veronese
  • Daniele Meli
  • Alessandro Farinelli

This paper proposes an integration of temporal logical reasoning and Partially Observable Markov Decision Processes (POMDPs) to achieve interpretable decision-making under uncertainty with macro-actions. Our method leverages a fragment of Linear Temporal Logic (LTL) based on Event Calculus (EC) to generate persistent (i. e. , constant) macro-actions, which guide Monte Carlo Tree Search (MCTS)-based POMDP solvers over a time horizon, significantly reducing inference time while ensuring robust performance. Such macro-actions are learnt via Inductive Logic Programming (ILP) from a few traces of execution (belief-action pairs), thus eliminating the need for manually designed heuristics and requiring only the specification of the POMDP transition model. In the Pocman and Rocksample benchmark scenarios, our learned macro-actions demonstrate increased expressiveness and generality when compared to time-independent heuristics, indeed offering substantial computational efficiency improvements.

AAMAS Conference 2025 Conference Paper

Monte Carlo Tree Search with Velocity Obstacles for Safe and Efficient Motion Planning in Dynamic Environments

  • Lorenzo Bonanni
  • Daniele Meli
  • Alberto Castellini
  • Alessandro Farinelli

Online motion planning is a challenging problem for intelligent robots moving in dense environments with dynamic obstacles, e. g. , crowds. In this work, we propose a novel approach for optimal and safe online motion planning with minimal information about dynamic obstacles. Specifically, our approach requires only the current position of the obstacles and their maximum speed, but it does not need any information about their exact trajectories or dynamic model. The proposed methodology combines Monte Carlo Tree Search (MCTS), for online optimal planning via model simulations, with Velocity Obstacles (VO), for obstacle avoidance. We perform experiments in a cluttered simulated environment with walls, and up to 40 dynamic obstacles moving with random velocities and directions. With an ablation study, we show the key contribution of VO in scaling up the efficiency of MCTS, selecting the safest and most rewarding actions in the tree of simulations. Moreover, we show the superiority of our methodology with respect to state-of-the-art planners, including Non-linear Model Predictive Control (NMPC), in terms of improved collision rate, computational and task performance.

JAIR Journal 2025 Journal Article

Probabilistically Tightened Linear Relaxation-based Perturbation Analysis for Neural Network Verification

  • Luca Marzari
  • Ferdinando Cicalese
  • Alessandro Farinelli

We present Probabilistically Tightened Linear Relaxation-based Perturbation Analysis (PT-LiRPA), a novel framework that combines over-approximation techniques from LiRPA-based approaches with a sampling-based method to compute tight intermediate reachable sets. In detail, we show that with negligible computational overhead, PT-LiRPA exploiting the estimated reachable sets, significantly tightens the lower and upper linear bounds of a neural network's output, reducing the computational cost of formal verification tools while providing probabilistic guarantees on verification soundness. Extensive experiments on standard formal verification benchmarks, including the International Verification of Neural Networks Competition, show that our PT-LiRPA-based verifier improves robustness certificates, i.e., the certified lower bound of ε perturbation tolerated by the models, by up to 3.31X and 2.26X compared to related work. Importantly, our probabilistic approach results in a valuable solution for challenging competition entries where state-of-the-art formal verification methods fail, allowing us to provide answers with high confidence (i.e., at least 99%).

JAIR Journal 2025 Journal Article

Scaling Safe Policy Improvement: Monte Carlo Tree Search and Policy Iteration Strategies

  • Federico Bianchi
  • Alberto Castellini
  • Edoardo Zorzi
  • Thiago D. Simão
  • Matthijs T. J. Spaan
  • Alessandro Farinelli

Offline Reinforcement Learning (RL) allows policies to be trained on pre-collected datasets without requiring additional interactions with the environment. This approach bypasses the need for real-time data acquisition in real-world applications, which can be impractical due to the safety issues inherent in the learning process. However, offline RL faces significant challenges, such as distributional shifts and extrapolation errors, and the resulting policies might underperform compared to the baseline policy. Safe policy improvement algorithms mitigate these issues, enabling the reliable deployment of RL approaches in real-world scenarios where historical data is available, guaranteeing that any policy changes will not result in worse performance compared to the baseline policy used to collect training data. In this paper, we propose MCTS-SPIBB, an algorithm that leverages Monte Carlo Tree Search (MCTS) for scaling safe policy improvement to large domains. We theoretically prove that the policy generated by MCTS-SPIBB converges to the optimal safely improved policy produced by Safe Policy Improvement with Baseline Bootstrapping (SPIBB) as the number of simulations increases. Additionally, we introduce SDP-SPIBB, a novel extension of SPIBB designed to address the scalability limitations of the standard algorithm via Scalable Dynamic Programming. Our empirical analysis across four benchmark domains demonstrates that MCTS-SPIBB and SDP-SPIBB significantly enhance the scalability of safe policy improvement, providing robust and efficient algorithms for large-scale applications. These contributions represent a significant step towards the deployment of safe RL algorithms in complex real-world environments.

RLC Conference 2025 Conference Paper

Seldonian Reinforcement Learning for Ad Hoc Teamwork

  • Edoardo Zorzi
  • Alberto Castellini
  • Leonidas Bakopoulos
  • Georgios Chalkiadakis
  • Alessandro Farinelli

Most offline RL algorithms return optimal policies but do not provide statistical guarantees on desirable behaviors. This could generate reliability issues in safety-critical applications, such as in some multiagent domains where agents, and possibly humans, need to interact to reach their goals without harming each other. In this work, we propose a novel offline RL approach, inspired by Seldonian optimization, which returns policies with good performance and statistically guaranteed properties with respect to predefined desirable behaviors. In particular, our focus is on Ad Hoc Teamwork settings, where agents must collaborate with new teammates without prior coordination. Our method requires only a pre-collected dataset, a set of candidate policies for our agent, and a specification about the possible policies followed by the other players---it does not require further interactions, training, or assumptions on the type and architecture of the policies. We test our algorithm in Ad Hoc Teamwork problems and show that it consistently finds reliable policies while improving sample efficiency with respect to standard ML baselines.

RLJ Journal 2025 Journal Article

Seldonian Reinforcement Learning for Ad Hoc Teamwork

  • Edoardo Zorzi
  • Alberto Castellini
  • Leonidas Bakopoulos
  • Georgios Chalkiadakis
  • Alessandro Farinelli

Most offline RL algorithms return optimal policies but do not provide statistical guarantees on desirable behaviors. This could generate reliability issues in safety-critical applications, such as in some multiagent domains where agents, and possibly humans, need to interact to reach their goals without harming each other. In this work, we propose a novel offline RL approach, inspired by Seldonian optimization, which returns policies with good performance and statistically guaranteed properties with respect to predefined desirable behaviors. In particular, our focus is on Ad Hoc Teamwork settings, where agents must collaborate with new teammates without prior coordination. Our method requires only a pre-collected dataset, a set of candidate policies for our agent, and a specification about the possible policies followed by the other players---it does not require further interactions, training, or assumptions on the type and architecture of the policies. We test our algorithm in Ad Hoc Teamwork problems and show that it consistently finds reliable policies while improving sample efficiency with respect to standard ML baselines.

TIST Journal 2025 Journal Article

Verifying Online Safety Properties for Safe Deep Reinforcement Learning

  • Luca Marzari
  • Ferdinando Cicalese
  • Alessandro Farinelli
  • Christopher Amato
  • Enrico Marchesini

Ensuring safety in reinforcement learning (RL) is critical for deploying agents in real-world applications. During training, current safe RL approaches often rely on indicator cost functions that provide sparse feedback, resulting in two key limitations: (i) poor sample efficiency due to the lack of safety information in neighboring states, and (ii) dependence on cost-value functions, leading to brittle convergence and suboptimal performance. After training, safety is guaranteed via formal verification (FV) methods for deep neural networks, whose computational complexity hinders their application during training. We address the limitations of using cost functions via verification by proposing a safe RL method based on a violation value—the risk associated with policy decisions in a portion of the state space. Our approach verifies safety properties (i.e., state-action pairs) that may lead to unsafe behavior, and quantifies the size of the state space where properties are violated. This violation value is then used to penalize the agent during training to encourage safer policy behavior. Given the NP-hard nature of FV, we propose an efficient, sample-based approximation with probabilistic guarantees to compute the violation value. Extensive experiments on standard benchmarks and real-world robotic navigation tasks show that violation-augmented approaches significantly improve safety by reducing the number of unsafe states encountered while achieving superior performance compared to existing methods.

RLC Conference 2024 Conference Paper

Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning

  • Davide Corsi
  • Davide Camponogara
  • Alessandro Farinelli

An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.

RLJ Journal 2024 Journal Article

Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning

  • Davide Corsi
  • Davide Camponogara
  • Alessandro Farinelli

An exciting and promising frontier for Deep Reinforcement Learning (DRL) is its application to real-world robotic systems. While modern DRL approaches achieved remarkable successes in many robotic scenarios (including mobile robotics, surgical assistance, and autonomous driving) unpredictable and non-stationary environments can pose critical challenges to such methods. These features can significantly undermine fundamental requirements for a successful training process, such as the Markovian properties of the transition model. To address this challenge, we propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and DRL. In more detail, we show that our benchmarking environment is problematic even for state-of-the-art DRL approaches that may struggle to generate reliable policies in terms of generalization power and safety. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques (such as curriculum learning and learnable hyperparameters). Our extensive empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results. Our simulation environment and training baselines are freely available to facilitate further research on this open problem and encourage collaboration in the field.

AAAI Conference 2024 Conference Paper

Enumerating Safe Regions in Deep Neural Networks with Provable Probabilistic Guarantees

  • Luca Marzari
  • Davide Corsi
  • Enrico Marchesini
  • Alessandro Farinelli
  • Ferdinando Cicalese

Identifying safe areas is a key point to guarantee trust for systems that are based on Deep Neural Networks (DNNs). To this end, we introduce the AllDNN-Verification problem: given a safety property and a DNN, enumerate the set of all the regions of the property input domain which are safe, i.e., where the property does hold. Due to the #P-hardness of the problem, we propose an efficient approximation method called ε-ProVe. Our approach exploits a controllable underestimation of the output reachable sets obtained via statistical prediction of tolerance limits, and can provide a tight —with provable probabilistic guarantees— lower estimate of the safe areas. Our empirical evaluation on different standard benchmarks shows the scalability and effectiveness of our method, offering valuable insights for this new type of verification of DNNs.

JAIR Journal 2024 Journal Article

Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach

  • Daniele Meli
  • Alberto Castellini
  • Alessandro Farinelli

Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).

IROS Conference 2024 Conference Paper

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

  • Francesco Taioli
  • Stefano Rosa
  • Alberto Castellini
  • Lorenzo Natale
  • Alessio Del Bue
  • Alessandro Farinelli
  • Marco Cristani
  • Yiming Wang 0002

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given by humans can contain errors when describing a spatial environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do not address this scenario, making the state-of-the-art methods in VLN-CE fragile in the presence of erroneous instructions from human users. For the first time, we propose a novel benchmark dataset that introduces various types of instruction errors considering potential human causes. This benchmark provides valuable insight into the robustness of VLN systems in continuous environments. We observe a noticeable performance drop (up to −25%) in Success Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark. Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset. We also propose an effective method, based on a cross-modal transformer architecture, that achieves the best performance in error detection and localization, compared to baselines. Surprisingly, our proposed method has revealed errors in the validation set of the two commonly used datasets for VLN-CE, i. e. , R2R-CE and RxR-CE, demonstrating the utility of our technique in other tasks.

IROS Conference 2024 Conference Paper

Path Re-Planning with Stochastic Obstacle Modeling: A Monte Carlo Tree Search Approach

  • Francesco Trotti
  • Alessandro Farinelli
  • Riccardo Muradore

Path re-planning and repairing are key topics for robust planning and navigation in open dynamic environments, finding applications in various domains such as fleet control of Unmanned Ground Vehicles (UGVs) in warehouses. The use of UGVs in open and dynamic environments requires flexible cooperation between human operators and the UGV fleet within a shared environment. In this paper, we propose a local strategy to re-plan the path of robots encountering unexpected and dynamic obstacles. Specifically, starting from a given Multi-Agent path, we model the re-planning problem as a Markov Decision Process (MDP) considering a stochastic obstacle lifespan, and we propose two local approaches based on Monte-Carlo Tree Search to re-plan the path of the robots that encounter obstacles. We compare these approaches with traditional Multi-Agent Path Finding (MAPF) algorithms to obtain new collision-free paths when an obstacle is detected. The evaluation is performed in simulation using benchmarking instances of warehouses and experimentally in a research facility with a scaled-down Industry 4. 0 production line.

ECAI Conference 2024 Conference Paper

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

  • Luca Marzari
  • Francesco Leofante
  • Ferdinando Cicalese
  • Alessandro Farinelli

We study the problem of assessing the robustness of counterfactual explanations for deep learning models. We focus on plausible model shifts altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model shifts, our approach does not impose requirements on the network to be analyzed, thus enabling robustness analysis on a wider range of architectures. Experiments on four binary classification datasets indicate that our method improves the state of the art in generating robust explanations, outperforming existing methods on a range of metrics.

ICML Conference 2024 Conference Paper

Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

  • Federico Bianchi 0002
  • Edoardo Zorzi
  • Alberto Castellini
  • Thiago D. Simão
  • Matthijs T. J. Spaan
  • Alessandro Farinelli

In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.

PRL Workshop 2024 Workshop Paper

Towards Neurosymbolic RL via Inductive Learning of Answer Set Programs

  • Celeste Veronese
  • Daniele Meli
  • Alessandro Farinelli

We present a novel approach that combines symbolic, logicbased reasoning with Reinforcement Learning (RL) to improve training outcome without compromising execution time. By integrating Inductive Logic Programming (ILP) with model-free RL, we learn human-interpretable policy heuristics that can bias the exploration process in RL algorithms. Utilizing Answer Set Programming (ASP) for logical representation of policy heuristics, and incremental ILP for efficient data stream management, we demonstrate the effectiveness of our approach on an approximate Q-learner for the Pac-Man scenario. Preliminary results show improved training outcome when offline learned specifications are used to bias the exploration phase. Moreover, the use of ASP heuristics does not significantly impact the training time, and incremental ILP paves the way towards a novel approach to neurosymbolic RL.

IROS Conference 2023 Conference Paper

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

  • Davide Corsi
  • Luca Marzari
  • Ameya Pore
  • Alessandro Farinelli
  • Alicia Casals
  • Paolo Fiorini
  • Diego Dall'Alba

The field of robotic Flexible Endoscopes (FEs) has progressed significantly, offering a promising solution to reduce patient discomfort. However, the limited autonomy of most robotic FEs results in non-intuitive and challenging manoeuvres, constraining their application in clinical settings. While previous studies have employed lumen tracking for autonomous navigation, they fail to adapt to the presence of obstructions and sharp turns when the endoscope faces the colon wall. In this work, we propose a Deep Reinforcement Learning (DRL)-based navigation strategy that eliminates the need for lumen tracking. However, the use of DRL methods poses safety risks as they do not account for potential hazards associated with the actions taken. To ensure safety, we exploit a Constrained Reinforcement Learning (CRL) method to restrict the policy in a predefined safety regime. Moreover, we present a model selection strategy that utilises Formal Verification (FV) to choose a policy that is entirely safe before deployment. We validate our approach in a virtual colonoscopy environment and report that out of the 300 trained policies, we could identify three policies that are entirely safe. Our work demonstrates that CRL, combined with model selection through FV, can improve the robustness and safety of robotic behaviour in surgical applications.

AAMAS Conference 2023 Conference Paper

Learning Logic Specifications for Soft Policy Guidance in POMCP

  • Giulio Mazzi
  • Daniele Meli
  • Alberto Castellini
  • Alessandro Farinelli

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a Monte Carlo Tree Search based strategy. However, POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached, particularly in environments with large state spaces and long horizons. Recently, logic specifications have been integrated into POMCP to guide exploration and to satisfy safety requirements. However, such policy-related rules require manual definition by domain experts, especially in real-world scenarios. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions, i. e. , sets of belief-action pairs generated by the planner. Specifically, we learn rules expressed in the paradigm of answer set programming. We then integrate them inside POMCP to provide soft policy bias toward promising actions. In the context of two benchmark scenarios, rocksample and battery, we show that the integration of learned rules from small task instances can improve performance with fewer Monte Carlo simulations and in larger task instances. We make our modified version of POMCP publicly available at https: //github. com/GiuMaz/pomcp_clingo. git.

ICRA Conference 2023 Conference Paper

Online Safety Property Collection and Refinement for Safe Deep Reinforcement Learning in Mapless Navigation

  • Luca Marzari
  • Enrico Marchesini
  • Alessandro Farinelli

Safety is essential for deploying Deep Reinforcement Learning (DRL) algorithms in real-world scenarios. Recently, verification approaches have been proposed to allow quantifying the number of violations of a DRL policy over input-output relationships, called properties. However, such properties are hard-coded and require task-level knowledge, making their application intractable in challenging safety-critical tasks. To this end, we introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time. CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties. Hence, we propose a refinement strategy to combine properties that model similar unsafe interactions. Our evaluation compares the benefits of computing the number of violations using standard hard-coded properties and the ones generated with CROP. We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.

AAMAS Conference 2023 Conference Paper

Safe Deep Reinforcement Learning by Verifying Task-Level Properties

  • Enrico Marchesini
  • Luca Marzari
  • Alessandro Farinelli
  • Christopher Amato

Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL). However, the cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. Such an encoding requires the agent to visit numerous unsafe states to learn a cost-value function to drive the learning process toward safety. Hence, increasing the number of unsafe interactions and decreasing sample efficiency. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric. This metric is computed by verifying task-level properties, shaped as input-output conditions, and it is used as a penalty to bias the policy away from unsafe states without learning an additional value function. We investigate the benefits of using the violation metric in standard Safe DRL benchmarks and robotic mapless navigation tasks. The navigation experiments bridge the gap between Safe DRL and robotics, introducing a framework that allows rapid testing on real robots. Our experiments show that policies trained with the violation penalty achieve higher performance over Safe DRL baselines and significantly reduce the number of visited unsafe states.

ICML Conference 2023 Conference Paper

Scalable Safe Policy Improvement via Monte Carlo Tree Search

  • Alberto Castellini
  • Federico Bianchi 0002
  • Edoardo Zorzi
  • Thiago D. Simão
  • Alessandro Farinelli
  • Matthijs T. J. Spaan

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i. e. , only in the states actually visited by the agent.

IJCAI Conference 2023 Conference Paper

The #DNN-Verification Problem: Counting Unsafe Inputs for Deep Neural Networks

  • Luca Marzari
  • Davide Corsi
  • Ferdinando Cicalese
  • Alessandro Farinelli

Deep Neural Networks are increasingly adopted in critical tasks that require a high level of safety, e. g. , autonomous driving. While state-of-the-art verifiers can be employed to check whether a DNN is unsafe w. r. t. some given property (i. e. , whether there is at least one unsafe input configuration), their yes/no output is not informative enough for other purposes, such as shielding, model selection, or training improvements. In this paper, we introduce the #DNN-Verification problem, which involves counting the number of input configurations of a DNN that result in a violation of a particular safety property. We analyze the complexity of this problem and propose a novel approach that returns the exact count of violations. Due to the #P-completeness of the problem, we also propose a randomized, approximate method that provides a provable probabilistic bound of the correct count while significantly reducing computational requirements. We present experimental results on a set of safety-critical benchmarks that demonstrate the effectiveness of our approximate method and evaluate the tightness of the bound.

AAMAS Conference 2022 Conference Paper

Active Generation of Logical Rules for POMCP Shielding

  • Giulio Mazzi
  • Alberto Castellini
  • Alessandro Farinelli

We consider the popular Partially Observable Monte-Carlo Planning (POMCP) algorithm and propose a methodology, called Active XPOMCP, for generating compact logical rules that represent properties of the control policy. These rules are then used as shields to prevent POMCP from selecting unexpected actions, with useful implications on the security and trustworthiness of the algorithm. Contrary to state-of-the-art methods, Active XPOMCP does not require a previously generated set of belief-action pairs to generate the logical rule, but it actively generates this data in an informationefficient way by querying the algorithm. Active XPOMCP reduces the number of beliefs needed to generate accurate rules with respect to state-of-the-art methods, and it allows to produce more accurate shields when few belief-action samples are available.

PRL Workshop 2022 Workshop Paper

An attention model for the formation of collectives in real-world domains

  • Adrià Fenoy Barceló
  • Filippo Bistaffa
  • Alessandro Farinelli

We consider the problem of forming collectives of agents for real-world applications aligned with Sustainable Development Goals (e. g. , shared mobility, cooperative learning). We propose a general approach for the formation of collectives based on a novel combination of an attention model and an integer linear program (ILP). In more detail, we propose an attention encoder-decoder model that transforms a collective formation instance to a weighted set packing problem, which is then solved by an ILP. Results on two real-world domains (i. e. , ridesharing and team formation for cooperative learning) show that our approach provides solutions that are comparable (in terms of quality) to the ones produced by state-of-theart approaches specific to each domain. Moreover, our solution outperforms the most recent general approach for forming collectives based on Monte Carlo tree search.

ICRA Conference 2022 Conference Paper

Enhancing Deep Reinforcement Learning Approaches for Multi-Robot Navigation via Single-Robot Evolutionary Policy Search

  • Enrico Marchesini
  • Alessandro Farinelli

Recent Multi-Agent Deep Reinforcement Learning approaches factorize a global action-value to address non-stationarity and favor cooperation. These methods, however, hinder exploration by introducing constraints (e. g. , additive value-decomposition) to guarantee the factorization. Our goal is to enhance exploration and improve sample efficiency of multi-robot mapless navigation by incorporating a periodical Evolutionary Policy Search (EPS). In detail, the multi-agent training “specializes” the robots' policies to learn the collision avoidance skills that are mandatory for the task. Concurrently, in this work we propose the use of Evolutionary Algorithms to explore different regions of the policy space in an environment with only a single robot. The idea is that core navigation skills, originated by the multi-robot policies using mutation operators, improve faster in the single-robot EPS. Hence, policy parameters can be injected into the multi-robot setting using crossovers, leading to improved performance and sample efficiency. Experiments in tasks with up to 12 robots confirm the beneficial transfer of navigation skills from the EPS to the multi-robot setting, improving the performance of prior methods.

AAAI Conference 2022 Conference Paper

Exploring Safer Behaviors for Deep Reinforcement Learning

  • Enrico Marchesini
  • Davide Corsi
  • Alessandro Farinelli

We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We leverage evolutionary exploration benefits to design a novel concept of safe mutations that use visited unsafe states to explore safer actions. We further characterize the behaviors of the policies over desired specifics with a sample-based bound estimation, which makes prior verification analysis tractable in the training loop. Hence, driving the learning process towards safer regions of the policy space. Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy.

IROS Conference 2021 Conference Paper

Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation

  • Enrico Marchesini
  • Davide Corsi
  • Alessandro Farinelli

We propose a novel benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. Aquatic navigation is an extremely challenging task due to the non-stationary environment and the uncertainties of the robotic platform, hence it is crucial to consider the safety aspect of the problem, by analyzing the behavior of the trained network to avoid dangerous situations (e. g. , collisions). To this end, we consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) and we propose a crossover-based strategy that combines gradient-based and gradient-free DRL to improve sample-efficiency. Moreover, we propose a verification strategy based on interval analysis that checks the behavior of the trained models over a set of desired properties. Our results show that the crossover-based training outperforms prior DRL approaches, while our verification allows us to quantify the number of configurations that violate the behaviors that are described by the properties. Crucially, this will serve as a benchmark for future research in this domain of applications.

IROS Conference 2021 Conference Paper

Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation

  • Enrico Marchesini
  • Alessandro Farinelli

We study the problem of multi-robot mapless navigation in the popular Centralized Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots and can lead to non-stationary issues in Deep Reinforcement Learning (DRL). The typical CTDE algorithm factorizes the joint action-value function into individual ones, to favor cooperation and achieve decentralized execution. Such factorization involves constraints (e. g. , monotonicity) that limit the emergence of novel behaviors in an individual as each agent is trained starting from a joint action-value. In contrast, we propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value, which is used to inject global state information in the value-based updates of the agents. Consequently, each model computes its gradient update for the weights, considering the overall state of the environment. Our idea follows the insights of Dueling Networks as a separate estimation of the joint state-value has both the advantage of improving sample efficiency, while providing each robot information whether the global state is (or is not) valuable. Experiments in a robotic navigation task with 2 4, and 8 robots, confirm the superior performance of our approach over prior CTDE methods (e. g. , VDN, QMIX).

UAI Conference 2021 Conference Paper

Formal verification of neural networks for safety-critical tasks in deep reinforcement learning

  • Davide Corsi
  • Enrico Marchesini
  • Alessandro Farinelli

In the last years, neural networks achieved groundbreaking successes in a wide variety of applications. However, for safety critical tasks, such as robotics and healthcare, it is necessary to provide some specific guarantees before the deployment in a real world context. Even in these scenarios, where high cost equipment and human safety are involved, the evaluation of the models is usually performed with the standard metrics (i. e. , cumulative reward or success rate). In this paper, we introduce a novel metric for the evaluation of models in safety critical tasks, the violation rate. We build our work upon the concept of formal verification for neural networks, providing a new formulation for the safety properties that aims to ensure that the agent always makes rational decisions. To perform this evaluation, we present ProVe (Property Verifier), a novel approach based on the interval algebra, designed for the analysis of our novel behavioral properties. We apply our method to different domains (i. e. , mapless navigation for mobile robots, trajectory generation for manipulators, and the standard ACAS benchmark). Results show that the violation rate computed by ProVe provides a good evaluation for the safety of trained models.

ICLR Conference 2021 Conference Paper

Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning

  • Enrico Marchesini
  • Davide Corsi
  • Alessandro Farinelli

The combination of Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) has been recently proposed to merge the benefits of both solutions. Existing mixed approaches, however, have been successfully applied only to actor-critic methods and present significant overhead. We address these issues by introducing a novel mixed framework that exploits a periodical genetic evaluation to soft update the weights of a DRL agent. The resulting approach is applicable with any DRL method and, in a worst-case scenario, it does not exhibit detrimental behaviours. Experiments in robotic applications and continuous control benchmarks demonstrate the versatility of our approach that significantly outperforms prior DRL, EAs, and mixed approaches. Finally, we employ formal verification to confirm the policy improvement, mitigating the inefficient exploration and hyper-parameter sensitivity of DRL.ment, mitigating the inefficient exploration and hyper-parameter sensitivity of DRL.

AAMAS Conference 2021 Conference Paper

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: A Rule-Based Approach

  • Giulio Mazzi
  • Alberto Castellini
  • Alessandro Farinelli

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47% has been achieved in our tests.

IROS Conference 2021 Conference Paper

POMP++: Pomcp-based Active Visual Search in unknown indoor environments

  • Francesco Giuliari
  • Alberto Castellini
  • Riccardo Berra
  • Alessio Del Bue
  • Alessandro Farinelli
  • Marco Cristani
  • Francesco Setti
  • Yiming Wang 0002

In this paper, we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy that enables the use of POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of >10% over the state-of-the-art methods.

IS Journal 2021 Journal Article

Predictive Model Generation for Load Forecasting in District Heating Networks

  • Alberto Castellini
  • Federico Bianchi
  • Alessandro Farinelli

District heating networks (DHNs) are promising technologies to increase the efficiency and reduce emissions of heat distribution to residential and commercial buildings. The advent of the smart grid paradigm has introduced the usage of heating load forecasting tools in DHNs. They provide estimates of future heating load, improving the planning of heat production and power station maintenance. In this article, we propose a methodology based on the integrated use of regularized regression and clustering for generating predictive models of future heating load in DHNs. The methodology is tested on a real case study based on a dataset provided by AGSM, an Italian utility company that manages a DHN in the city of Verona, Italy. We generate a set of multiple-equation models having different degrees of complexity and show that models generated by the proposed approach outperform those trained by standard methods. Moreover, we provide an interpretation of patterns encoded by these models, and show that they identify real operational states of the network. The approach is completely data driven.

ICAPS Conference 2021 Conference Paper

Rule-based Shielding for Partially Observable Monte-Carlo Planning

  • Giulio Mazzi
  • Alberto Castellini
  • Alessandro Farinelli

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i. e. , sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by the expert. The second contribution is a module that uses online the logical formulas to identify anomalous actions selected by POMCP and substitute those actions with actions that satisfy the logical formulas fulfilling expert knowledge. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the shielded POMCP outperforms the standard POMCP in a case study in which a wrong parameter of POMCP makes it select wrong actions from time to time. Moreover, we show that the approach keeps good performance also if the parameters of the logical formula are optimized using trajectories containing some wrong actions.

IROS Conference 2021 Conference Paper

Safe Reinforcement Learning using Formal Verification for Tissue Retraction in Autonomous Robotic-Assisted Surgery

  • Ameya Pore
  • Davide Corsi
  • Enrico Marchesini
  • Diego Dall'Alba
  • Alicia Casals
  • Alessandro Farinelli
  • Paolo Fiorini

Deep Reinforcement Learning (DRL) is a viable solution for automating repetitive surgical subtasks due to its ability to learn complex behaviours in a dynamic environment. This task automation could lead to reduced surgeon’s cognitive workload, increased precision in critical aspects of the surgery, and fewer patient-related complications. However, current DRL methods do not guarantee any safety criteria as they maximise cumulative rewards without considering the risks associated with the actions performed. Due to this limitation, the application of DRL in the safety-critical paradigm of robot-assisted Minimally Invasive Surgery (MIS) has been constrained. In this work, we introduce a Safe-DRL framework that incorporates safety constraints for the automation of surgical subtasks via DRL training. We validate our approach in a virtual scene that replicates a tissue retraction task commonly occurring in multiple phases of an MIS. Furthermore, to evaluate the safe behaviour of the robotic arms, we formulate a formal verification tool for DRL methods that provides the probability of unsafe configurations. Our results indicate that a formal analysis guarantees safety with high confidence such that the robotic instruments operate within the safe workspace and avoid hazardous interaction with other anatomical structures.

EUMAS Conference 2020 Conference Paper

A Game of Double Agents: Repeated Stackelberg Games with Role Switch

  • Matteo Murari
  • Alessandro Farinelli
  • Riccardo Sartea

Abstract We introduce a novel variation of the widely used 2-player Stackelberg game formalism. In our variation, a master player can decide to act as a leader or as a follower across the iterations of the game. This model naturally arises in many real-world applications and particularly in cyber-security scenarios, where an analyzer agent can arbitrarily decide which role to play in each iteration. We propose a first solution approach for this model assuming bounded rationality for the players and adopting a Monte Carlo Tree Search approach to devise the analyzer’s strategy. We empirically show the effectiveness of our method in two experimental domains, i. e. synthetic game instances (using randomly generated games) and malware analysis (using real malware samples).

EUMAS Conference 2020 Conference Paper

Decentralised Multi-intersection Congestion Control for Connected Autonomous Vehicles

  • Huan Vu
  • Samir Aknine
  • Sarvapali D. Ramchurn
  • Alessandro Farinelli

Abstract This paper presents a decentralised mechanism for traffic control of connected autonomous vehicles in settings where multiple road intersections have to be managed and optimised. We propose a solution based on the distributed constraint optimisation approach (DCOP). We build upon state of the art algorithm for single-intersection management in order to manage congestion both across and within intersections. Furthermore, to solve the DCOP, we propose an improved node ordering policy for the Max-sum_AD_VP algorithm. Empirical evaluation of our model and algorithm demonstrate that our approach outperforms existing benchmarks by up to 32% in terms of average delay for both single and multiple intersection setup.

ICRA Conference 2020 Conference Paper

Discrete Deep Reinforcement Learning for Mapless Navigation

  • Enrico Marchesini
  • Alessandro Farinelli

Our goal is to investigate whether discrete state space algorithms are a viable solution to continuous alternatives for mapless navigation. To this end we present an approach based on Double Deep Q-Network and employ parallel asynchronous training and a multi-batch Priority Experience Replay to reduce the training time. Experiments show that our method trains faster and outperforms both the continuous Deep Deterministic Policy Gradient and Proximal Policy Optimization algorithms. Moreover, we train the models in a custom environment built on the recent Unity learning toolkit and show that they can be exported on the TurtleBot3 simulator and to the real robot without further training. Overall our optimized method is 40% faster compared to the original discrete algorithm. This setting significantly reduces the training times with respect to the continuous algorithms, maintaining a similar level of success rate hence being a viable alternative for mapless navigation.

EUMAS Conference 2020 Conference Paper

Explaining the Influence of Prior Knowledge on POMCP Policies

  • Alberto Castellini
  • Enrico Marchesini
  • Giulio Mazzi
  • Alessandro Farinelli

Abstract Partially Observable Monte Carlo Planning is a recently proposed online planning algorithm which makes use of Monte Carlo Tree Search to solve Partially Observable Monte Carlo Decision Processes. This solver is very successful because of its capability to scale to large uncertain environments, a very important property for current real-world planning problems. In this work we propose three main contributions related to POMCP usage and interpretability. First, we introduce a new planning problem related to mobile robot collision avoidance in paths with uncertain segment difficulties, and we show how POMCP performance in this context can take advantage of prior knowledge about segment difficulty relationships. This problem has direct real-world applications, such as, safety management in industrial environments where human-robot interaction is a crucial issue. Then, we present an experimental analysis about the relationships between prior knowledge provided to the algorithm and performance improvement, showing that in our case study prior knowledge affects two main properties, namely, the distance between the belief and the real state, and the mutual information between segment difficulty and action taken in the segment. This analysis aims to improve POMCP explainability, following the line of recently proposed eXplainable AI and, in particular, eXplainable planning. Finally, we analyze results on a synthetic case study and show how the proposed measures can improve the understanding about internal planning mechanisms.

IROS Conference 2019 Conference Paper

A Comparative Analysis on the use of Autoencoders for Robot Security Anomaly Detection

  • Matteo Olivato
  • Omar Cotugno
  • Lorenzo Brigato
  • Domenico Daniele Bloisi
  • Alessandro Farinelli
  • Luca Iocchi

While robots are more and more deployed among people in public spaces, the impact of cyber-security attacks is significantly increasing. Most of consumer and professional robotic systems are affected by multiple vulnerabilities and the research in this field is just started. This paper addresses the problem of automatic detection of anomalous behaviors possibly coming from cyber-security attacks. The proposed solution is based on extracting system logs from a set of internal variables of a robotic system, on transforming such data into images, and on training different Autoencoder architectures to classify robot behaviors to detect anomalies. Experimental results in two different scenarios (autonomous boats and social robots) show effectiveness and general applicability of the proposed method.

AAMAS Conference 2019 Conference Paper

Agent Behavioral Analysis Based on Absorbing Markov Chains

  • Riccardo Sartea
  • Alessandro Farinelli
  • Matteo Murari

We propose a novel technique to identify known behaviors of intelligent agents acting within uncertain environments. We employ Markov chains to represent the observed behavioral models of the agents and we formulate the problem as a classification task. In particular, we propose to use the long-term transition probability values of moving between states of the Markov chain as features. Additionally, we transform our models into absorbing Markov chains, enabling the use of standard techniques to compute such features. The empirical evaluation considers two scenarios: the identification of given strategies in classical games, and the detection of malicious behaviors in malware analysis. Results show that our approach can provide informative features to successfully identify known behavioral patterns. In more detail, we show that focusing on the long-term transition probability enables to diminish the error introduced by noisy states and transitions that may be present in an observed behavioral model. We pose particular attention to the case of noise that may be intentionally introduced by a target agent to deceive an observer agent.

IROS Conference 2019 Conference Paper

Data Flow ORB-SLAM for Real-time Performance on Embedded GPU Boards

  • Stefano Aldegheri
  • Nicola Bombieri
  • Domenico Daniele Bloisi
  • Alessandro Farinelli

The use of embedded boards on robots, including unmanned aerial and ground vehicles, is increasing thanks to the availability of GPU equipped low-cost embedded boards in the market. Porting algorithms originally designed for desktop CPUs on those boards is not straightforward due to hardware limitations. In this paper, we present how we modified and customized the open source SLAM algorithm ORB-SLAM2 to run in real-time on the NVIDIA Jetson TX2. We adopted a data flow paradigm to process the images, obtaining an efficient CPU/GPU load distribution that results in a processing speed of about 30 frames per second. Quantitative experimental results on four different sequences of the KITTI datasets demonstrate the effectiveness of the proposed approach. The source code of our data flow ORB-SLAM2 algorithm is publicly available on GitHub.

AAMAS Conference 2019 Conference Paper

eXplainable Modeling (XM): Data Analysis for Intelligent Agents

  • Alberto Castellini
  • Francesco Masillo
  • Riccardo Sartea
  • Alessandro Farinelli

Intelligent agents perform key tasks in several application domains by processing sensor data and taking actions that maximize reward functions based on internal models of the environment and the agent itself. In this paper we present eXplainable Modeling (XM), a Python software which supports data analysis for intelligent agents. XM enables to analyze state-models, namely models of the agent states, discovered from sensor traces by data-driven methods, and to interpret them for improved situation awareness. The main features of the tool are described through the analysis of a real case study concerning aquatic drones for water monitoring.

IJCAI Conference 2019 Conference Paper

Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning

  • Alberto Castellini
  • Georgios Chalkiadakis
  • Alessandro Farinelli

Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.

JAIR Journal 2018 Journal Article

A COP Model For Graph-Constrained Coalition Formation

  • Filippo Bistaffa
  • Alessandro Farinelli

We consider Graph-Constrained Coalition Formation (GCCF), a widely studied subproblem of coalition formation in which the set of valid coalitions is restricted by a graph. We propose COP-GCCF, a novel approach that models GCCF as a COP, and we solve such COP with a highly-parallel approach based on Bucket Elimination executed on the GPU, which is able to exploit the high constraint tightness of COP-GCCF. Results show that our approach outperforms state of the art algorithms (i.e., DyCE and IDP G ) by at least one order of magnitude on realistic graphs, i.e., a crawl of the Twitter social graph, both in terms of runtime and memory.

IJCAI Conference 2018 Conference Paper

A COP Model for Graph-Constrained Coalition Formation (Extended Abstract)

  • Filippo Bistaffa
  • Alessandro Farinelli

We focus on Graph-Constrained Coalition Formation (GCCF), a widely studied subproblem of coalition formation where the set of valid coalitions is constrained by a graph. We propose COP-GCCF, a novel approach that models GCCF as a COP. We then solve such COP with a highly-parallel GPU implementation of Bucket Elimination, which is able to exploit the high constraint tightness of COP-GCCF. Results on realistic graphs, i. e. , a crawl of the Twitter social graph, show that our approach outperforms state of the art algorithms (i. e. , DyCE and IDP G ) by at least one order of magnitude, both in terms of runtime and memory.

AAMAS Conference 2018 Conference Paper

Detection of Intelligent Agent Behaviors Using Markov Chains

  • Riccardo Sartea
  • Alessandro Farinelli

We consider the problem of detecting the behavior of intelligent agents operating in stochastic environments. In particular, we focus on a scenario where we are given two models for agent behaviors and we are interested in detecting whether one model appears within the other model. We use Markov chains to represent the behavioral models of the agents and we propose to extract the long-run probabilities as features that can be used to detect if one model is contained in the other. Results show that our approach is capable of detecting known strategies for agents interacting within classical games and to categorize malware behaviors.

IJCAI Conference 2017 Conference Paper

A Monte Carlo Tree Search approach to Active Malware Analysis

  • Riccardo Sartea
  • Alessandro Farinelli

Active Malware Analysis (AMA) focuses on acquiring knowledge about dangerous software by executing actions that trigger a response in the malware. A key problem for AMA is to design strategies that select most informative actions for the analysis. To devise such actions, we model AMA as a stochastic game between an analyzer agent and a malware sample, and we propose a reinforcement learning algorithm based on Monte Carlo Tree Search. Crucially, our approach does not require a pre-specified malware model but, in contrast to most existing analysis techniques, we generate such model while interacting with the malware. We evaluate our solution using clustering techniques on models generated by analyzing real malware samples. Results show that our approach learns faster than existing techniques even without any prior information on the samples.

TIST Journal 2017 Journal Article

Algorithms for Graph-Constrained Coalition Formation in the Real World

  • Filippo Bistaffa
  • Alessandro Farinelli
  • Jesús Cerquides
  • Juan Rodríguez-Aguilar
  • Sarvapali D. Ramchurn

Coalition formation typically involves the coming together of multiple, heterogeneous, agents to achieve both their individual and collective goals. In this article, we focus on a special case of coalition formation known as Graph-Constrained Coalition Formation (GCCF) whereby a network connecting the agents constrains the formation of coalitions. We focus on this type of problem given that in many real-world applications, agents may be connected by a communication network or only trust certain peers in their social network. We propose a novel representation of this problem based on the concept of edge contraction, which allows us to model the search space induced by the GCCF problem as a rooted tree. Then, we propose an anytime solution algorithm (Coalition Formation for Sparse Synergies (CFSS)), which is particularly efficient when applied to a general class of characteristic functions called m + a functions. Moreover, we show how CFSS can be efficiently parallelised to solve GCCF using a nonredundant partition of the search space. We benchmark CFSS on both synthetic and realistic scenarios, using a real-world dataset consisting of the energy consumption of a large number of households in the UK. Our results show that, in the best case, the serial version of CFSS is four orders of magnitude faster than the state of the art, while the parallel version is 9.44 times faster than the serial version on a 12-core machine. Moreover, CFSS is the first approach to provide anytime approximate solutions with quality guarantees for very large systems of agents (i.e., with more than 2,700 agents).

ECAI Conference 2016 Conference Paper

CUBE: A CUDA Approach for Bucket Elimination on GPUs

  • Filippo Bistaffa
  • Nicola Bombieri
  • Alessandro Farinelli

We consider Bucket Elimination (BE), a popular algorithmic framework to solve Constraint Optimisation Problems (COPs). We focus on the parallelisation of the most computationally intensive operations of BE, i. e. , join sum and maximisation, which are key ingredients in several close variants of the BE framework (including Belief Propagation on Junction Trees and Distributed COP techniques such as ActionGDL and DPOP). In particular, we propose CUBE, a highly-parallel GPU implementation of such operations, which adopts an efficient memory layout allowing all threads to independently locate their input and output addresses in memory, hence achieving a high computational throughput. We compare CUBE with the most recent GPU implementation of BE. Our results show that CUBE achieves significant speed-ups (up to two orders of magnitude) w. r. t. the counterpart approach, showing a dramatic decrease of the runtime w. r. t. the serial version (i. e. , up to 652× faster). More important, such speed-ups increase when the complexity of the problem grows, showing that CUBE correctly exploits the additional degree of parallelism inherent in the problem.

JAAMAS Journal 2016 Journal Article

Interacting with team oriented plans in multi-robot systems

  • Alessandro Farinelli
  • Masoume M. Raeissi
  • Paul Scerri

Abstract Team oriented plans have become a popular tool for operators to control teams of autonomous robots to pursue complex objectives in complex environments. Such plans allow an operator to specify high level directives and allow the team to autonomously determine how to implement such directives. However, the operators will often want to interrupt the activities of individual team members to deal with particular situations, such as a danger to a robot that the robot team cannot perceive. Previously, after such interrupts, the operator would usually need to restart the team plan to ensure its success. In this paper, we present an approach to encoding how interrupts can be smoothly handled within a team plan. Building on a team plan formalism that uses Colored Petri Nets, we describe a mechanism that allows a range of interrupts to be handled smoothly, allowing the team to efficiently continue with its task after the operator intervention. We validate the approach with an application of robotic watercraft and show improved overall efficiency. In particular, we consider a situation where several platforms should travel through a set of pre-specified locations, and we identify three specific cases that require the operator to interrupt the plan execution: (i) a boat must be pulled out; (ii) all boats should stop the plan and move to a pre-specified assembly position; (iii) a set of boats must synchronize to traverse a dangerous area one after the other. Our experiments show that the use of our interrupt mechanism decreases the time to complete the plan (up to 48 % reduction) and decreases the operator load (up to 80 % reduction in number of user actions). Moreover, we performed experiments with real robotic platforms to validate the applicability of our mechanism in the actual deployment of robotic watercraft.

ECAI Conference 2016 Conference Paper

Skeleton-Based Orienteering for Level Set Estimation

  • Lorenzo Bottarelli
  • Manuele Bicego
  • Jason Blum
  • Alessandro Farinelli

In recent years, the use of unmanned vehicles for monitoring spatial environmental phenomena has gained increasing attention. Within this context, an interesting problem is level set estimation, where the goal is to identify regions of space where the analyzed phenomena (for example the PH value in a body of water) is above or below a given threshold level. Typically, in the literature this problem is approached with techniques which search for the most interesting sampling locations to collect the desired information (i. e. , locations where we can gain the most information by sampling). However, the common assumption underlying this class of approaches is that acquiring a sample is expensive (e. g. , in terms of consumed energy and time). In this paper, we take a different perspective on the same problem by considering the case where a mobile vehicle can continuously acquire measurements with a negligible cost, through high rate sampling sensors. In this scenario, it is crucial to reduce the path length that the mobile platform executes to collect the data. To address this issue, we propose a novel algorithm, called Skeleton-Based Orienteering for Level Set Estimation (SBOLSE). Our approach starts from the LSE formulation introduced in [10] and formulates the level set estimation problem as an orienteering problem. This allows one to determine informative locations while considering the length of the path. To combat the complexity associated with the orienteering approach, we propose a heuristic approach based on the concept of topological skeletonization. We evaluate our algorithm by comparing it with the state of the art approaches (i. e. , LSE and LSE-batch) both on a real world dataset extracted from mobile platforms and on a synthetic dataset extracted from CO2 maps. Results show that our approach achieves a near optimal classification accuracy while significantly reducing the travel distance (up to 70% w. r. t LSE and 30% w. r. t. LSE-batch).

ECAI Conference 2016 Conference Paper

Using Petri Net Plans for Modeling UAV-UGV Cooperative Landing

  • Andrea Bertolaso
  • Masoume M. Raeissi
  • Alessandro Farinelli
  • Riccardo Muradore

Use of cooperative multi vehicle team including aerial and ground vehicles has been growing rapidly over the last years, ranging from search and rescue to logistics. In this paper, we consider a cooperative landing task problem, where an unmanned aerial vehicle (UAV) must land on an unmanned ground vehicle (UGV) while such ground vehicle is moving in the environment to execute its own mission. To solve this challenging problem we consider the Petri Net Plans (PNPs) framework, an advanced planning specification framework, to effectively use different controllers in different conditions and to monitor the evolution of the system during mission execution so that the best controller is always used even in face of unexpected situations. Empirical simulation results show that our system can properly monitor the joint mission carried out by the UAV/UGV team, hence confirming that the use of a formal planning language significantly helps in the design of such complex scenarios.

IJCAI Conference 2015 Conference Paper

Biclustering Gene Expressions Using Factor Graphs and the Max-Sum Algorithm

  • Matteo Denitto
  • Alessandro Farinelli
  • Manuele Bicego

Biclustering is an intrinsically challenging and highly complex problem, particularly studied in the biology field, where the goal is to simultaneously cluster genes and samples of an expression data matrix. In this paper we present a novel approach to gene expression biclustering by providing a binary Factor Graph formulation to such problem. In more detail, we reformulate biclustering as a sequential search for single biclusters and use an efficient optimization procedure based on the Max Sum algorithm. Such approach, drastically alleviates the scaling issues of previous approaches for biclustering based on Factor Graphs obtaining significantly more accurate results on synthetic datasets. A further analysis on two real-world datasets confirms the potentials of the proposed methodology when compared to alternative state of the art methods.

AAAI Conference 2015 Conference Paper

Sharing Rides with Friends: A Coalition Formation Algorithm for Ridesharing

  • Filippo Bistaffa
  • Alessandro Farinelli
  • Sarvapali Ramchurn

We consider the Social Ridesharing (SR) problem, where a set of commuters, connected through a social network, arrange one-time rides at short notice. In particular, we focus on the associated optimisation problem of forming cars to minimise the travel cost of the overall system modelling such problem as a graph constrained coalition formation (GCCF) problem, where the set of feasible coalitions is restricted by a graph (i. e. , the social network). Moreover, we significantly extend the state of the art algorithm for GCCF, i. e. , the CFSS algorithm, to solve our GCCF model of the SR problem. Our empirical evaluation uses a real dataset for both spatial (Geo- Life) and social data (Twitter), to validate the applicability of our approach in a realistic application scenario. Empirical results show that our approach computes optimal solutions for systems of medium scale (up to 100 agents) providing significant cost reductions (up to −36. 22%). Moreover, we can provide approximate solutions for very large systems (i. e. , up to 2000 agents) and good quality guarantees (i. e. , with an approximation ratio of 1. 41 in the worst case) within minutes (i. e. , 100 seconds).

IROS Conference 2014 Conference Paper

A directional visual descriptor for large-scale coverage problems

  • Marco Tamassia
  • Alessandro Farinelli
  • Vittorio Murino
  • Alessio Del Bue

Visual coverage of large scale environments is a challenging problem that has many practical applications such as large scale 3D reconstruction, search and rescue and active video surveillance. In this paper, we consider a setting where mobile robots must acquire visual information using standard cameras, while minimizing associated movement costs. The main source of complexity for such scenario is the lack of a priori knowledge of 3D structures for the surrounding environment. To address this problem, we propose a novel descriptor for visual coverage that aims at measuring the orientation dependent visual information of an area, based on a regular discretization of the 3D environment in voxels. Next, we use the proposed visual descriptor to define an autonomous cooperative exploration approach, which controls the robot movements so to maximize information accuracy and minimizing movement costs. We empirically evaluate our approach in a simulation scenario based on real data for large scale 3D environments, and on widely used robotic tools (such as ROS and Stage). Experimental results show that the proposed method significantly outperforms a baseline random approach and an uncoordinated one, thus being a valid proposal for visual coverage in large scale outdoor scenarios.

IJCAI Conference 2013 Conference Paper

C-Link: A Hierarchical Clustering Approach to Large-Scale Near-Optimal Coalition Formation

  • Alessandro Farinelli
  • Manuele Bicego
  • Sarvapali Ramchurn
  • Mauro Zucchelli

Coalition formation is a fundamental approach to multi-agent coordination. In this paper we address the specific problem of coalition structure generation, and focus on providing good-enough solutions using a novel heuristic approach that is based on data clustering methods. In particular, we propose a hierarchical agglomerative clustering approach (C-Link), which uses a similarity criterion between coalitions based on the gain that the system achieves if two coalitions merge. We empirically evaluate C-Link on a synthetic benchmark data-set as well as in collective energy purchasing settings. Our results show that the C-link approach performs very well against an optimal benchmark based on Mixed-Integer Programming, achieving solutions which are in the worst case about 80% of the optimal (in the synthetic data-set), and 98% of the optimal (in the energy data-set). Thus we show that C-Link can return solutions for problems involving thousands of agents within minutes.

AAMAS Conference 2013 Conference Paper

RMASBench: A Benchmarking System for Multi-Agent Coordination in Urban Search and Rescue

  • Fabio Maffioletti
  • Riccardo Reffato
  • Alessandro Farinelli
  • Alexander Kleiner
  • Sarvapali Ramchurn
  • Bing Shi

This demonstration paper illustrates RMASBench, a new benchmarking system based on the RoboCup Rescue Agent simulator. The aim of the system is to facilitate benchmarking of coordination approaches in controlled settings for dynamic rescue scenarios. In particular, the key features of the systems are: i) programming interfaces to plug-in coordination algorithms without the need for implementing and tuning low-level agents’ behaviors, ii) implementations of state-of-the art coordination approaches: DSA and Max- Sum, iii) a large scale crowd simulator, which exploits GPUs parallel architecture, to simulate the behaviour of thousands of agents in real time.

AAMAS Conference 2012 Conference Paper

Decentralised stable coalition formation among energy consumers in the smart grid

  • Filippo Bistaffa
  • Alessandro Farinelli
  • Meritxell Vinyals
  • Alex Rogers

The vision of the Smart Grid includes demand-side peak shaving strategies, such as real-time pricing or profile's based tariffs, to encourage consumption such that the peaks on demand are flattened. Up to date, most works along this line focused on optimising via scheduling of home appliances or micro-storage the individual user consumption. Alternatively, in this demonstration we propose to exploit the consumers social side by allowing them to self-organise into coalitions of energy users with complementary needs. To this ends, we present an agent-based Java simulation of a social network of energy consumers (based on the domestic electricity market and usage patterns of homes in the UK) that uses to converge to stable energy coalitions.

AAMAS Conference 2012 Conference Paper

Decentralized Bayesian Reinforcement Learning for Online Agent Collaboration

  • Luke Teacy
  • Georgios Chalkiadakis
  • Alessandro Farinelli
  • Alex Rogers
  • NICK JENNINGS
  • Sally McClean
  • Gerard Parr

Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art.

AAAI Conference 2010 Conference Paper

A Distributed Algorithm for Optimising over Pure Strategy Nash Equilibria

  • Archie Chapman
  • Alessandro Farinelli
  • Enrique Munoz de Cote
  • Alex Rogers
  • Nicholas Jennings

We develop an efficient algorithm for computing pure strategy Nash equilibria that satisfy various criteria (such as the utilitarian or Nash–Bernoulli social welfare functions) in games with sparse interaction structure. Our algorithm, called Valued Nash Propagation (VNP), integrates the optimisation problem of maximising a criterion with the constraint satisfaction problem of finding a game’s equilibria to construct a criterion that defines a c–semiring. Given a suitably compact game structure, this criterion can be efficiently optimised using message–passing. To this end, we first show that VNP is complete in games whose interaction structure forms a hypertree. Then, we go on to provide theoretic and empirical results justifying its use on games with arbitrary structure; in particular, we show that it computes the optimum >82% of the time and otherwise selects an equilibrium that is always within 2% of the optimum on average.

AAMAS Conference 2010 Conference Paper

Coalition Formation with Spatial and Temporal Constraints

  • Sarvapali D. Ramchurn
  • Maria Polukarov
  • Alessandro Farinelli
  • Cuong Truong
  • Nicholas R. Jennings

The coordination of emergency responders and robots to undertake a number of tasks in disaster scenarios is a grand challenge for multi-agent systems. Central to this endeavour is the problem of forming the best teams (coalitions) of responders to perform the various tasks in the area where the disaster has struck. Moreover, these teams may have to form, disband, and reform in different areas of the disaster region. This is because in most cases there will be more tasks than agents. Hence, agents need to schedule themselves to attempt each task in turn. Second, the tasks themselves can be very complex: requiring the agents to work on them for different lengths of time and having deadlines by when they need to be completed. The problem is complicated still further when different coalitions perform tasks with different levels of efficiency. Given all these facets, we define this as The Coalition Formation with Spatial and Temporal constraints problem (CFSTP). We show that this problem is NP-hard--in particular, it contains the well-known complex combinatorial problem of Team Orienteering as a special case. Based on this, we design a Mixed Integer Program to optimally solve small-scale instances of the CFSTP and develop new anytime heuristics that can, on average, complete 97\% of the tasks for large problems (20 agents and 300 tasks). In so doing, our solutions represent the first results for CFSTP.

AAMAS Conference 2010 Conference Paper

Efficient Multi-Agent Coordination Using Resource-Aware Junction Trees

  • Nicolas Stefanovitch
  • Alessandro Farinelli
  • Alex Rogers
  • Nicholas R. Jennings

In this paper we address efficient decentralised coordination for cooperative multi-agent systems (framed as DCOPs) by taking intoaccount the communication and computational resources of the system. We focus on techniques that exploit structural independenceamong agents' actions to provide optimal solutions to the coordination problem, and, in particular, we use the Generalized DistributedLaw (GDL) algorithm. In this settings, we propose a novel resourceaware heuristic to build junction trees and to schedule GDL computations across the agents. Our approach aims at minimising directlythe total running time of the coordination process, rather than thetheoretical complexity of the computation, by considering computational and communication capabilities of agents.

NeurIPS Conference 2010 Conference Paper

Worst-case bounds on the quality of max-product fixed-points

  • Meritxell Vinyals
  • Jes\'us Cerquides
  • Alessandro Farinelli
  • Juan Rodríguez-Aguilar

We study worst-case bounds on the quality of any fixed point assignment of the max-product algorithm for Markov Random Fields (MRF). We start proving a bound independent of the MRF structure and parameters. Afterwards, we show how this bound can be improved for MRFs with particular structures such as bipartite graphs or grids. Our results provide interesting insight into the behavior of max-product. For example, we prove that max-product provides very good results (at least 90% of the optimal) on MRFs with large variable-disjoint cycles (MRFs in which all cycles are variable-disjoint, namely that they do not share any edge and in which each cycle contains at least 20 variables).

IJCAI Conference 2009 Conference Paper

  • Ruben Stranders
  • Alessandro Farinelli
  • Alex Rogers
  • Nicholas R. Jennings

In this paper, we introduce an on-line, decentralised coordination algorithm for monitoring and predicting the state of spatial phenomena by a team of mobile sensors. These sensors have their application domain in disaster response, where strict time constraints prohibit path planning in advance. The algorithm enables sensors to coordinate their movements with their direct neighbours to maximise the collective information gain, while predicting measurements at unobserved locations using a Gaussian process. It builds upon the max-sum message passing algorithm for decentralised coordination, for which we present two new generic pruning techniques that result in speed-up of up to 92% for 5 sensors. We empirically evaluate our algorithm against several on-line adaptive coordination mechanisms, and report a reduction in root mean squared error up to 50% compared to a greedy strategy.

AAMAS Conference 2008 Conference Paper

A Decentralized Approach to Cooperative Situation Assessment in Multi-Robot Systems

  • Giuseppe Settembre
  • Alessandro Farinelli
  • Paul Scerri
  • Katia Sycara
  • Daniele Nardi

To act effectively under uncertainty, multi-robot teams need to accurately estimate the state of the environment. Although individual robots, with uncertain sensors, may not be able to accurately determine the current situation, the team as a whole should have the capability to perform situation assessment. However, sharing all information with all other team mates is not scalable nor is centralization of all information possible. This paper presents a decentralized approach to cooperative situation assessment that balances use of communication bandwidth with the need for good situation assessment. When a robot believes locally that a particular plan should be executed, it sends a proposal for that plan, to one of its team mates. The robot receiving the plan proposal, can either agree with the plan and forward it on, or it can provide sensor information to suggest that an alternative plan might have higher expected utility. Once sufficient robots agree with the proposal, the plan is initiated. The algorithm successfully balances the value of cooperative sensing against the cost of sharing large volumes of information. Experiments verify the utility of the approach, showing that the algorithm dramatically out-performs individual decisionmaking and obtains performance similar to a centralized approach.

AAMAS Conference 2008 Conference Paper

Decentralised Coordination of Low-Power Embedded Devices Using the Max-Sum Algorithm

  • Alessandro Farinelli
  • Alex Rogers
  • Adrian Petcu
  • NICK JENNINGS

This paper considers the problem of performing decentralised coordination of low-power embedded devices (as is required within many environmental sensing and surveillance applications). Specifically, we address the generic problem of maximising social welfare within a group of interacting agents. We propose a novel representation of the problem, as a cyclic bipartite factor graph, composed of variable and function nodes (representing the agents’ states and utilities respectively). We show that such representation allows us to use an extension of the max-sum algorithm to generate approximate solutions to this global optimisation problem through local decentralised message passing. We empirically evaluate this approach on a canonical coordination problem (graph colouring), and benchmark it against state of the art approximate and complete algorithms (DSA and DPOP). We show that our approach is robust to lossy communication, that it generates solutions closer to those of DPOP than DSA is able to, and that it does so with a communication cost (in terms of total messages size) that scales very well with the number of agents in the system (compared to the exponential increase of DPOP). Finally, we describe a hardware implementation of our algorithm operating on low-power Chipcon CC2431 System-on-Chip sensor nodes.

IJCAI Conference 2007 Conference Paper

  • Alessandro Farinelli
  • Paul Scerri
  • Alberto Ingenito
  • Daniele Nardi

A prerequisite to efficient behavior by a multi-robot team is the ability to accurately perceive the environment. In this paper, we present an approach to deal with sensing uncertainty at the coordination level. Specifically, robots attach information regarding features that caused the initiation of a course of action, to any coordination message for that activity. Further information regarding such features, acquired by the team, are then combined and the expected utility of the started action is re-evaluated accordingly. Experiments show that the approach allows to coordinate a large group of robots, addressing sensing uncertainty in a tractable way.

IJCAI Conference 2007 Conference Paper

  • Alessandro Farinelli
  • Alberto Finzi
  • Thomas Lukasiewicz

In this paper, we present the agent programming language TeamGolog, which is a novel approach to programming a team of cooperative agents under partial observability. Every agent is associated with a partial control program in Golog, which is completed by the TeamGolog interpreter in an optimal way by assuming a decision-theoretic semantics. The approach is based on the key concepts of a synchronization state and a communication state, which allow the agents to passively resp. actively coordinate their behavior, while keeping their belief states, observations, and activities invisible to the other agents. We show the usefulness of the approach in a rescue simulated domain.

ICRA Conference 2007 Conference Paper

Heterogeneous Feature State Estimation with Rao-Blackwellized Particle Filters

  • Gian Diego Tipaldi
  • Alessandro Farinelli
  • Luca Iocchi
  • Daniele Nardi

In this paper we present a novel technique to estimate the state of heterogeneous features from inaccurate sensors. The proposed approach exploits the reliability of the feature extraction process in the sensor model and uses a Rao-Blackwellized particle filter to address the data association problem. Experimental results show that the use of reliability improves performance by allowing the approach to perform better data association among detected features. Moreover, the method has been tested on a real robot during an exploration task in a non-planar environment. This last experiment shows an improvement in correctly detecting and classifying interesting features for navigation purpose.

ICRA Conference 2005 Conference Paper

Task Assignment with Dynamic Perception and Constrained Tasks in a Multi-Robot System

  • Alessandro Farinelli
  • Luca Iocchi
  • Daniele Nardi
  • Vittorio A. Ziparo

In this paper we present an asynchronous distributed mechanism for allocating tasks in a team of robots. Tasks to be allocated are dynamically perceived from the environment and can be tied by execution constraints. Conflicts among team mates arise when an uncontrolled number of robots execute the same task, resulting in waste of effort and spatial conflicts. The critical aspect of task allocation in Multi Robot Systems is related to conflicts generated by limited and noisy perception capabilities of real robots. This requires significant extensions to the task allocation techniques developed for software agents. The proposed approach is able to successfully allocate roles to robots avoiding conflicts among team mates and maintaining low communication overhead. We implemented our method on AIBO robots and performed quantitative analysis in a simulated environment.

IROS Conference 2003 Conference Paper

Design and evaluation of multi agent systems for rescue operations

  • Alessandro Farinelli
  • Giorgio Grisetti
  • Luca Iocchi
  • Sergio Lo Cascio
  • Daniele Nardi

The activities of search and rescue of victims in large-scale disasters are very relevant social problems, and from a scientific viewpoint raise many different technical problems in the fields of artificial intelligence, robotics and multi agent systems. In this paper we describe the development of a multi agent system based on the RoboCup Rescue simulator to allow monitoring and decision support, that are needed in a rescue operation. Two significant accomplishments are reported in this paper: the first is a framework for cognitive agent development that provides for the capabilities of information fusion, planning and coordination; the second one is a methodology for evaluation of multi-agent systems in this scenario that aims at measuring not only the efficiency of a system, but also its robustness when conditions in the environment change.