Author name cluster

Guy Shani

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

49 papers

2 author rows

AIJ Journal 2026 Journal Article

Factored planning in partially observable and deterministic multi-agent domains

Shashank Shekhar
Ronen I. Brafman
Guy Shani

We consider the problem of solving qualitative decentralized partially observable Markov decision processes (QDec-POMDPs) with deterministic actions. QDec-POMDPs model dynamic systems consisting of a collaborative team of agents acting under uncertainty and partial observability, attempting to reach a desirable goal state. They can be viewed as the multi-agent version of the contingent planning model. In this work, we extend the idea of factored planning from fully observable multi-agent planning to partial observability. Our method operates as follows: First, we simplify the multi-agent planning (MAP) problem by reducing it to a single-agent planning problem. Then, we use the solution to this single-agent problem as a skeleton plan, that each agent attempts to complete separately. We describe different variants of this idea, and in particular, suggest a method that models information about each agent’s knowledge and incorporates the idea of signaling information to other agents through actions. We perform an extensive empirical evaluation over new and old domains, demonstrating the enhanced scalability of our method.

ECAI Conference 2024 Conference Paper

Heuristics for Partially Observable Stochastic Contingent Planning

Guy Shani

Acting to complete tasks in stochastic partially observable domains is an important problem in artificial intelligence, and is often formulated as a goal-based POMDP. Goal-based POMDPs can be solved using the RTDP-BEL algorithm, that operates by running forward trajectories from the initial belief to the goal. These trajectories can be guided by a heuristic, and more accurate heuristics can result in significantly faster convergence. In this paper, we develop a heuristic function that leverages the structured representation of domain models. We compute, in a relaxed space, a plan to achieve the goal, while taking into account the value of information, as well as the stochastic effects. We provide experiments showing that while our heuristic is slower to compute, it requires an order of magnitude less trajectories before convergence. Overall, it thus speeds up RTDP-BEL, particularly in problems where significant information gathering is needed.

IROS Conference 2024 Conference Paper

Online Planning for Multi Agent Path Finding in Inaccurate Maps

Nir Malka
Guy Shani
Roni Stern

In multi-agent path finding (MAPF), agents navigate to their target positions without conflict within an environment, typically represented as a graph. Traditionally, the input graph is assumed to be accurate. We investigate MAPF scenarios where the input graph may be inaccurate, containing non-existent edges or missing edges present in the environment. Agents can verify the existence or non-existence of an edge only by moving close to it. To navigate such maps, we propose an online approach where planning and execution are interleaved. As agents gather new information about the environment over time, they replan accordingly. To minimize replanning efforts, we developed methods to identify and replan only for agents affected by observed changes. To scale to larger problems, we defer conflicts resolution expected only in the distant future and adapt single-agent path-finding algorithms to account for map inaccuracies. Experimental results show impressive scalability, solving problems involving over 1000 agents in under 3 minutes.

ICAPS Conference 2023 Conference Paper

Multi Agent Path Finding under Obstacle Uncertainty

Bar Shofer
Guy Shani
Roni Stern

In multi-agent path finding (MAPF), several agents must move from their current positions to their target positions without colliding. Prior work on MAPF commonly assumed perfect knowledge of the environment. We consider a MAPF setting where this is not the case, and the planner does not know a-priori whether some positions are blocked or not. To sense whether such a position is traversable, an agent must move close to it and adapt its behavior accordingly. In this work we focus on solving this type of MAPF problem, for cases where planning is centralized but cannot be done during execution. In this setting, a solution can be formulated as a plan tree for each agent, branching on the observations. We propose algorithms for finding such plans trees for two modes of executions: centralized, where the agents share information concerning observed obstacles during execution, a decentralized, where such communication is not allowed. The proposed algorithms are complete and can be configured to optimize solution cost, measured for either the best case or the worst case. We implemented these algorithms and provide experimental results demonstrating how our approach scales with respect to the number of agents and the number of positions we are uncertain about. The results show that our algorithms can solve non-trivial problems, but also highlight that this type of MAPF problems is significantly harder than classical MAPF.

ECAI Conference 2023 Conference Paper

Online Evaluation of Tail Project Boosting in Citizen Science

Amit Sultan
Avi Segal
Guy Shani
Darlene Cavalier
Kobi Gal

In citizen science, regular people provide invaluable information by contributing to scientific projects. Citizen science platforms, such as SciStarter, provide easy access to numerous such projects. Often, users contribute mainly to a relatively small set of popular projects, while it is difficult for many projects to draw the attention of users. Thus, increasing the contribution of users to such low-popularity projects may increase scientific and societal impact. In this paper, we explore the power of a recommender system to draw attention to less popular projects. Standard use of recommendation systems often leads to limited exposure of less popular (tail) projects. We thus propose a re-ranking approach based on “lift boosting, ” which uses the statistical lift measure to enhance the exposure of tail projects. By combining lift and traditional relevance measures, our method re-ranks the recommendation list to emphasize projects that are both relevant to the user while also have a high lift value. We implement our approach on SciStarter, one of the biggest citizen science platforms on the web. We conduct an online experiment involving over 2000 real users. Our results show a positive shift towards less popular projects without compromising overall contribution rates. This work demonstrates the potential of our lift-boosting method for promoting the discovery of tail projects in citizen science platforms, thereby fostering a more diverse range of scientific contributions.

SoCS Conference 2022 Conference Paper

An Online Approach for Multi-Agent Path Finding Under Movement Uncertainty (Extended Abstract)

Elad Levy
Guy Shani
Roni Stern

In this work, we address the problem of finding paths for multiple agents while avoiding collisions between them, where agents

JAAMAS Journal 2022 Journal Article

Privacy preserving planning in multi-agent stochastic environments

Tommy Hefner
Guy Shani
Roni Stern

Abstract Collaborative privacy preserving planning ( cppp ) gained much attention in the past decade. cppp aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. To date, cppp has focused on domains with deterministic action effects. However, in real-world problems action effects are often non-deterministic, and actions can have multiple possible effects with varying probabilities. In this paper, we introduce Stochastic cppp ( scppp ), which is an extension of cppp to domains with stochastic action effects. We show how scppp can be modeled as a Markov decision process ( mdp ) and how the value-iteration algorithm can be adapted to solve it. This adaptation requires extending value-iteration to support multiple agents and privacy. Then, we present two adaptions of the real-time dynamic programming ( rtdp ) algorithm, a popular algorithm for solving mdp s, designed to solve scppp problems. The first rtdp adaptation, called distributed rtdp ( drtdp ), yields identical behavior to applying rtdp in a centralized manner on the joint problem. To preserve privacy, drtdp uses a message passing mechanism adopted from the mafs algorithm. The second rtdp adaptation is an approximation of drtdp called public synchronization rtdp ( ps - rtdp ). ps - rtdp differs from drtdp mainly in its message passing mechanism, where ps - rtdp sends significantly fewer messages than drtdp. We experimented on domains adapted from the deterministic cppp literature by adding different stochastic effects to different actions. The results show that ps - rtdp can reduce the amount of messages compared to drtdp by orders of magnitude thus improving run-time, while producing policies with similar expected costs.

JAAMAS Journal 2022 Journal Article

Reducing disclosed dependencies in privacy preserving planning

Rotem Lev Lehman
Guy Shani
Roni Stern

Abstract In collaborative privacy preserving planning ( cppp ), a group of agents jointly creates a plan to achieve a set of goals while preserving each others’ privacy. In state of the art cppp algorithms, the agents avoid explicitly sharing the value of private state variables. However, they may implicitly reveal dependencies between actions, that is, which action facilitates achieving the preconditions of another action. Previous work in cppp did not limit the disclosure of such dependencies. In this paper, we explicitly limit the amount of disclosed dependencies, allowing agents to publish only some of the dependencies between their actions. We investigate different strategies for deciding which dependencies to publish, and how they affect the ability to find solutions. We evaluate the ability of two solvers — distribute forward search and centralized planning based on a single-agent projection — to produce plans under this constraint. Experiments over standard cppp domains show that the proposed dependency-sharing strategies enable generating plans while sharing only a small fraction of all dependencies.

JAAMAS Journal 2022 Journal Article

Unavoidable deadends in deterministic partially observable contingent planning

Lera Shtutland
Dorin Shmaryahu
Guy Shani

Abstract Traditionally, a contingent plan, branching on the observations an agent obtains throughout plan execution, must reach a goal state from every possible initial state. However, in many real world problems, no such plan exists. Yet, there are plans that reach the goal from some initial states only. From the other initial states, they eventually reach a deadend—a state from which the goal can not be achieved. Deadends that cannot be avoided by resorting to a different plan, are called unavoidable deadends. In this paper we study planning with unavoidable deadends in belief space. We distinguish between two types of such deadends, and adapt offline and online contingent planners to identify and handle unavoidable deadends, using two approaches—an active approach that begins by distinguishing between the solvable and deadend states, and a lazy approach, that plans to achieve the goal, identifying deadends as they occur. We empirically analyze how each approach performs in different cases.

TAAS Journal 2021 Journal Article

Computing Contingent Plan Graphs using Online Planning

Shlomi Maliah
Radimir Komarnitski
Guy Shani

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. Recent successful approaches translate the partially observable contingent problem into a non-deterministic fully observable problem, and then use a planner for non-deterministic planning. However, the translation may become very large, encumbering the task of the non-deterministic planner. We suggest a different approach—using an online contingent solver repeatedly to construct a plan tree. We execute the plan returned by the online solver until the next observation action, and then branch on the possible observed values, and replan for every branch independently. In many cases a plan tree can have an exponential width in the number of state variables, but the tree may have a structure that allows us to compactly represent it using a directed graph. We suggest a mechanism for tailoring such a graph that reduces both the computational effort and the storage space. Our method also handles non-deterministic domains, by identifying cycles in the plans. We present a set of experiments, showing our approach to scale better than state-of-the-art offline planners.

AAAI Conference 2021 Conference Paper

Improved Knowledge Modeling and Its Use for Signaling in Multi-Agent Planning with Partial Observability

Shashank Shekhar
Ronen I. Brafman
Guy Shani

Collaborative Multi-Agent Planning (MAP) problems with uncertainty and partial observability are often modeled as Dec-POMDPs. Yet, in deterministic domains, Qualitative Dec-POMDPs can scale up to much larger problem sizes. The best current QDec solver (QDec-FP) reduces MAP problems to multiple single-agent problems. In this paper we describe a planner that uses richer information about agents’ knowledge to improve upon QDec-FP. With this change, the planner not only scales up to larger problems with more objects, but it can also support signaling, where agents signal information to each other by changing the state of the world.

AAAI Conference 2021 Conference Paper

Intelligent Recommendations for Citizen Science

Daniel Ben Zaken
Kobi Gal
Guy Shani
Avi Segal
Darlene Cavalier

Citizen science refers to scientKobiific research that is carried out by volunteers, often in collaboration with professional scientists. The spread of the internet has allowed volunteers to contribute to citizen science projects in dramatically new ways while creating scientific value and gaining pedagogical and social benefits. Given the sheer size of available projects, finding the right project, which best suits the user preferences and capabilities, has become a major challenge and is essential for keeping volunteers motivated and active contributors. We address this challenge by developing a system for personalizing project recommendations which was fully deployed in the wild. We adapted several recommendation algorithms to the citizen science domain from the literature based on memory-based and model-based collaborative filtering approaches. The algorithms were trained on historical data of users’ interactions in the SciStarter platform - a leading citizen science site - as well as their contributions to different projects. The trained algorithms were evaluated in SciStarter and involved hundreds of users who were provided with personalized recommendations for new projects they had not contributed to before. The results show that using the new recommendation system led people to increased participation in new SciStarter projects when compared to groups that were recommended projects using nonpersonalized recommendation approaches, and compared to behavior before recommendations. In particular, the group of volunteers receiving recommendations created by an SVD algorithm (matrix factorization) exhibited the highest levels of contributions to new projects, when compared to the other cohorts. A follow-up survey conducted with the SciStarter community confirmed that users felt that the recommendations matched their personal interests and goals. Based on these results, our recommendation system is now fully integrated into the SciStarter portal, positively affecting hundreds of users each week, and leading to social and educational benefits.

AAMAS Conference 2021 Conference Paper

Partial Disclosure of Private Dependencies in Privacy Preserving Planning

Rotem Lev Lehman
Guy Shani
Roni Stern

AIJ Journal 2021 Journal Article

Using POMDPs for learning cost sensitive decision trees

Shlomi Maliah
Guy Shani

In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Partially Observable Markov Decision Processes (POMDPs) as a modeling tool for cost sensitive classification. POMDPs are typically solved through a policy over belief states. We show how a relatively small set of potentially important belief states can be identified, and define an MDP over these belief states. To identify these potentially important belief states, we construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our tree-based MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.

ICAPS Conference 2020 Conference Paper

Privacy Preserving Planning in Stochastic Environments

Guy Shani
Roni Stern
Tommy Hefner

Collaborative privacy preserving planning (cppp) has gained much attention in the past decade. To date, cppp has focused on domains with deterministic action effects. In this paper, we extend cppp to domains with stochastic action effects. We show how such environments can be modeled as an mdp. We then focus on the popular Real-Time Dynamic Programming (RTDP) algorithm for computing value functions for mdps, extending it to the stochastic cppp setting. We provide two versions of RTDP: a complete version identical to executing centralized RTDP, and an approximate version that sends significantly fewer messages and computes competitive policies in practice. We experiment on domains adapted from the deterministic cppp literature.

ICAPS Conference 2019 Conference Paper

A Factored Approach to Deterministic Contingent Multi-Agent Planning

Shashank Shekhar 0002
Ronen I. Brafman
Guy Shani

Collaborative Multi-Agent Planning (MAP) under uncertainty with partial observability is a notoriously difficult problem. Such MAP problems are often modeled as DecPOMDPs, or its qualitative variant, QDec-POMDP, which is essentially a MAP version of contingent planning. The QDecPOMDP model was introduced with the hope that its simpler, non-probabilistic structure will allow for better scalability. Indeed, at least with deterministic actions, the recent IMAP algorithm scales much better than comparable DecPOMDP algorithms (Bazinin and Shani 2018). In this work we suggest a new approach to solving Deterministic QDecPOMDPs based on problem factoring. First, we find a solution to a MAP problem where the results of any observation is available to all agents. This is essentially a single-agent planning problem for the entire team. Then, we project the solution tree into sub-trees, one per agent, and let each agent transform its projected tree into a legal local tree. If all agents succeed, we combine the trees into a valid joint-plan. Otherwise, we continue to explore the space of team solutions. This approach is sound, complete, and as our empirical evaluation demonstrates, scales much better than the IMAP algorithm.

JAAMAS Journal 2019 Journal Article

Comparative criteria for partially observable contingent planning

Dorin Shmaryahu
Guy Shani
Jörg Hoffmann

Abstract In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. The solution can be represented as a plan tree or graph, branching on various possible observations. Typically in contingent planning one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others, such as plans that lead to the goal with a lower average cost. However, methods such as average cost make an implicit assumption concerning the probabilities of outcomes, which may not apply when the stochastic dynamics of the environment are unknown. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria—plan simplicity, dominance, and best and worst plan costs. We also argue that in some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We also suggest efficient algorithms for computing the comparative criteria where needed. We provide experimental results, showing that existing contingent planners provide diverse plans, that can be compared using these criteria.

AAMAS Conference 2019 Conference Paper

Comparative Criteria for Partially Observable Contingent Planning: JAAMAS Track

Dorin Shmaryahu
Jörg Hoffmann
Guy Shani

In contingent planning under partial observability with sensing actions, the solution can be represented as a plan tree, branching on various possible observations. Typically, one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria — plan simplicity, dominance, and best and worst plan costs. In some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and we provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We suggest efficient algorithms for computing the comparative criteria. We provide experimental results, showing that plans computed by existing contingent planners can be compared using the suggested criteria.

JAAMAS Journal 2018 Journal Article

Action dependencies in privacy-preserving multi-agent planning

Shlomi Maliah
Guy Shani
Roni Stern

Abstract Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. In many CPPP algorithms, the individual agents reason about a projection of the multi-agent problem onto a single-agent classical planning problem. For example, an agent can plan as if it controls the public actions of other agents, ignoring any private preconditions and effects theses actions may have, and use the cost of this plan as a heuristic estimate of the cost of the full, multi-agent plan. Using such a projection, however, ignores some dependencies between agents’ public actions. In particular, it does not contain dependencies between public actions of other agents caused by their private facts. We propose a projection in which these private dependencies are maintained. The benefit of our dependency-preserving projection is demonstrated by using it to produce high-level plans in a new privacy-preserving planner, and as a heuristic for guiding forward search privacy-preserving algorithms. Both are able to solve more benchmark problems than any other state-of-the-art privacy-preserving planner. This more informed projection does not explicitly expose any private fact, action, or precondition. In addition, we show that even if an adversary agent knows that an agent has some private objects of a given type (e. g. , trucks), it cannot infer the number of such private objects that the agent controls. This introduces a novel form of strong privacy, which we call object-cardinality privacy, that is motivated by real-world requirements.

IJCAI Conference 2018 Conference Paper

Advances and Challenges in Privacy Preserving Planning

Guy Shani

Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. CPPP has gained attention in recent years as an important sub area of multi agent planning, presenting new challenges to the planning community. In this paper we describe recent advancements, and outline open problems and future directions in this field. We begin with describing different models of privacy, such as weak and strong privacy, agent privacy, and cardinality preserving privacy. We then discuss different solution approaches, focusing on the two prominent methods --- joint creation of a global coordination scheme first, followed by independent planning to extend the global scheme with private actions; and collaborative local planning where agents communicate information concerning their planning process. In both cases a heuristic is needed to guide the search process. We describe several adaptations of well known classical planning heuristic to CPPP, focusing on the difficulties in computing the heuristic without disclosing private information.

JAAMAS Journal 2018 Journal Article

Landmark-based heuristic online contingent planning

Shlomi Maliah
Guy Shani
Ronen I. Brafman

Abstract In contingent planning problems, agents have partial information about their state and use sensing actions to learn the value of some variables. When sensing and actuation are separated, plans for such problems can often be viewed as a tree of sensing actions, separated by conformant plans consisting of non-sensing actions that enable the execution of the next sensing action. We propose a heuristic, online method for contingent planning which focuses on identifying the next useful sensing action. We select the next sensing action based on a landmark heuristic, adapted from classical planning. We discuss landmarks for plan trees, providing several alternative definitions and discussing their merits. The key part of our planner is the novel landmarks-based heuristic, together with a projection method that uses classical planning to solve the intermediate conformant planning problems. The resulting heuristic contingent planner solves many more problems than state-of-the-art, translation-based online contingent planners, and in most cases, much faster, up to 3 times faster on simple problems, and 200 times faster on non-simple domains.

AAAI Conference 2018 Conference Paper

MDP-Based Cost Sensitive Classification Using Decision Trees

Shlomi Maliah
Guy Shani

In classiﬁcation, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassiﬁcation of an instance. Cost sensitive classiﬁcation attempts to minimize the expected cost of classiﬁcation, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Markov Decision Processes as a modeling tool for cost sensitive classiﬁcation. We construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classiﬁcation accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassiﬁcation costs.

ICAPS Conference 2018 Conference Paper

Simulated Penetration Testing as Contingent Planning

Dorin Shmaryahu
Guy Shani
Jörg Hoffmann 0001
Marcel Steinmetz

In penetration testing (pentesting), network administrators attack their own network to identify and fix vulnerabilities. Planning-based simulated pentesting can achieve much higher testing coverage than manual pentesting. A key challenge is for the attack planning to imitate human hackers as faithfully as possible. POMDP models have been proposed to this end, yet they are computationally very hard, and it is unclear how to acquire the models in practice. At the other extreme, classical planning models are scalable and simple to obtain, yet completely ignore the incomplete knowledge characteristic of hacking. We propose contingent planning as a new middle ground, feasible in both computation burden and model acquisition effort while allowing for a representation of incomplete knowledge. We design the model, show how to adapt available solvers, and show how to acquire the model from real network scans in practice. We experiment on real networks and show that our approach scales to practical input sizes.

ICAPS Conference 2017 Conference Paper

Increased Privacy with Reduced Communication in Multi-Agent Planning

Shlomi Maliah
Ronen I. Brafman
Guy Shani

Multi-agent forward search (MAFS) is a state-of-the-art privacy-preserving planning algorithm. We describe a new variant of MAFS, called multi-agent forward-backward search (MAFBS) that uses both forward and backward messages to reduce the number of messages sent and obtain new privacy properties. While MAFS requires agents to send a state s produced by an action a to all agents that can apply any action in s, MAFBS sends such messages forward only to agents that have an action that requires one of the effects of a. To achieve completeness, it sends messages backward to agents that can supply a missing precondition. This more focused message passing scheme reduces states exchanged, and requires that agents be aware only of other agents that they directly interact with, leading to agent privacy.

TIST Journal 2016 Journal Article

Anytime Algorithms for Recommendation Service Providers

David Ben-Shimon
Lior Rokach
Guy Shani
Bracha Shapira

Recommender systems (RS) can now be found in many commercial Web sites, often presenting customers with a short list of additional products that they might purchase. Many commercial sites do not typically have the ability and resources to develop their own system and may outsource the RS to a third party. This had led to the growth of a recommendation as a service industry, where companies, referred to as RS providers, provide recommendation services. These companies must carefully balance the cost of building recommendation models and the payment received from the e-business, as these payments are expected to be low. In such a setting, restricting the computational time required for model building is critical for the RS provider to be profitable. In this article, we propose anytime algorithms as an attractive method for balancing computational time and the recommendation model performance, thus tackling the RS provider problem. In an anytime setting, an algorithm can be stopped after any amount of computational time, always ensuring that a valid, although suboptimal, solution will be returned. Given sufficient time, however, the algorithm should converge to an optimal solution. In this setting, it is important to evaluate the quality of the returned solution over time, monitoring quality improvement. This is significantly different from traditional evaluation methods, which mostly estimate the performance of the algorithm only after its convergence is given sufficient time. We show that the popular item-item top-N recommendation approach can be brought into the anytime framework by smartly considering the order by which item pairs are being evaluated. We experimentally show that the time-accuracy trade-off can be significantly improved for this specific problem.

JAAMAS Journal 2016 Journal Article

Collaborative privacy preserving multi-agent planning

Shlomi Maliah
Guy Shani
Roni Stern

Abstract In many cases several entities, such as commercial companies, need to work together towards the achievement of joint goals, while hiding certain private information. To collaborate effectively, some sort of plan is needed to coordinate the different entities. We address the problem of automatically generating such a coordination plan while preserving the agents’ privacy. Maintaining privacy is challenging when planning for multiple agents, especially when tight collaboration is needed and a global high-level view of the plan is required. In this work we present the Greedy Privacy-Preserving Planner (GPPP), a privacy preserving planning algorithm in which the agents collaboratively generate an abstract and approximate global coordination plan and then individually extend the global plan to executable plans. To guide GPPP, we propose two domain independent privacy preserving heuristics based on landmarks and pattern databases, which are classical heuristics for single agent search. These heuristics, called privacy-preserving landmarks and privacy preserving PDBs, are agnostic to the planning algorithm and can be used by other privacy-preserving planning algorithms. Empirically, we demonstrate on benchmark domains the benefits of using these heuristics and the advantage of GPPP over existing privacy preserving planners for the multi-agent STRIPS formalism.

AAAI Conference 2016 Conference Paper

Computing Contingent Plans Using Online Replanning

Radimir Komarnitsky
Guy Shani

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. For this class of problems the solution can be represented as a plan tree, branching on various possible observations. Recent successful approaches translate the partially observable contingent problem into a non-deterministic fully observable problem, and then use a planner for non-deterministic planning. While this approach has been successful in many domains, the translation may become very large, encumbering the task of the nondeterministic planner. In this paper we suggest a different approach — using an online contingent solver repeatedly to construct a plan tree. We execute the plan returned by the online solver until the next observation action, and then branch on the possible observed values, and replan for every branch independently. In many cases a plan tree can be exponential in the number of state variables, but still, the tree has a structure that allows us to compactly represent it using a directed graph. We suggest a mechanism for tailoring such a graph that reduces both the computational effort and the storage space. Furthermore, unlike recent state of the art ofﬂine planners, our approach is not bounded to a speciﬁc class of contingent problems, such as limited problem width, or simple contingent problems. We present a set of experiments, showing our approach to scale better than state of the art of- ﬂine planners.

AIJ Journal 2016 Journal Article

Online belief tracking using regression for contingent planning

Ronen I. Brafman
Guy Shani

In online contingent planning under partial observability an agent decides at each time step on the next action to execute, given its initial knowledge of the world, the actions executed so far, and the observation made. Such agents require some representation of their belief state to determine which actions are valid, or whether the goal has been achieved. Efficient maintenance of a belief state is, given its potential exponential size, a key research challenge in this area. In this paper we develop the theory of regression as a useful tool for belief-state maintenance. We provide a formal description of regression, discussing various alternatives and optimization techniques, and analyze its space and time complexity. In particular, we show that, with some care, the regressed formula will contain variables relevant to the current query only, rather than all variables in the problem description. Consequently, under suitable assumptions, the complexity of regression queries is at most exponential in its contextual width. This parameter is always upper bounded by Bonet and Geffner's width parameter, introduced in their state-of-the-art factored belief tracking (FBT) method. In addition, we show how to obtain a poly-sized circuit representation for the online regression formula even with non-deterministic actions. We provide an empirical comparison of regression with FBT-based belief maintenance, showing the power of regression for online belief tracking. We also suggest caching techniques for regression, and demonstrate their value in reducing runtime in current benchmarks.

ICAPS Conference 2016 Conference Paper

Online Macro Generation for Privacy Preserving Planning

Shlomi Maliah
Guy Shani
Ronen I. Brafman

Agents that use Multi-Agent Forward Search (MAFS) todo privacy-preserving planning, often repeatedly develop similar paths. We describe a simple technique for online macro generation allowing agents to reuse successful previous action sequences. By focusing on specific sequences that end with a single public action only, we are able to address the utility problem -- our technique has negligible cost, yet provides both speedups and reduced communication in domains where agents have a reasonable amount of private actions. We describe two variants of our approach, both with attractive privacy preserving properties, and demonstrate the value of macros empirically. We also show that one variant is equivalent to secure MAFS.

ICAPS Conference 2016 Conference Paper

Stronger Privacy Preserving Projections for Multi-Agent Planning

Shlomi Maliah
Guy Shani
Roni Stern

Collaborative privacy-preserving planning (CPPP) is a multi-agent planning task in which agents need to achieve a common set of goals without revealing certain private information. In many CPPP algorithms the individual agents reason about a projection of the multiagent problem onto a single-agent classical planning problem. For example, an agent can plan as if it controls the public actions of other agents, ignoring their unknown private preconditions and effects, and use the cost of this plan as a heuristic for the cost of the full, multi-agent plan. Using such a projection, however, ignores some dependencies between agents’ public actions. In particular, it does not contain dependencies between actions of other agents caused by their private facts. We propose a projection in which these private dependencies are maintained. The benefit of our dependency-preserving projection is demonstrated by using it to produce high level plans in a new privacy preserving planner that is able to solve more benchmark problems than any other state-of-the-art privacy preserving planner. This more informed projection does not explicitly share private information. In addition, we show that even if an adversary agent knows that an agent has some private objects of a given type (e. g. , trucks), it cannot infer how many such private objects the agent controls. This introduces a novel strong form of privacy that is motivated by real-world requirements.

ECAI Conference 2014 Conference Paper

On The Properties of Belief Tracking for Online Contingent Planning using Regression

Ronen I. Brafman
Guy Shani

Planning under partial observability typically requires some representation of the agent's belief state - either online to determine which actions are valid, or offline for planning. Due to its potential exponential size, efficient maintenance of a belief state is, thus, a key research challenge in this area. The state-of-the-art factored belief tracking (FBT) method addresses this problem by maintaining multiple smaller projected belief states, each involving only a subset of the variable set. Its complexity is exponential in the size of these subsets, as opposed to the entire variable set, without jeopardizing completeness. In this paper we develop the theory of regression to serve as an alternative tool for belief-state maintenance. Regression is a well known technique enjoying similar, and potentially even better worst-case complexity, as its complexity depends on the actions and observations that actually took place, rather than all actions and potential observations, as in the FBT method. On the other hand, FBT is likely to have better amortized complexity if the number of queries to the belief state is very large. An empirical comparison of regression with FBT-based belief maintenance is carried out, showing that the two perform similarly.

ICAPS Conference 2014 Conference Paper

Partially Observable Online Contingent Planning Using Landmark Heuristics

Shlomi Maliah
Ronen I. Brafman
Erez Karpas
Guy Shani

In contingent planning problems, agents have partial information about their state anduse sensing actions to learn the value of some variables. When sensing and actuation are separated, plans for such problems can often be viewed as a tree of sensing actions, separated by conformant plans consisting of non-sensing actions that enable the execution of the next sensing action. This leads us to propose a heuristic, online method for contingent planning which focuses on identifying thenext useful sensing action. The key part of our planner is a novel landmarks-based heuristic for selecting the next sensing action, together with a projection method that uses classical planning to solve the intermediate conformant planning problems. This allows our planner to operate without an explicit model of belief space or the use of existing translation techniques, both of which can require exponential space. The resulting Heuristic Contingent Planner (HCP) solves many more problems than state-of-the-art, translation-based online contingent planners, and in most cases much faster.

ECAI Conference 2014 Conference Paper

Privacy Preserving Landmark Detection

Shlomi Maliah
Guy Shani
Roni Stern

In many cases several entities, such as commercial companies, need to work together towards the achievement of joint goals, while hiding certain private information. Multi-agent STRIPS (MA-STRIPS) is a new and attractive model for describing collaborative multi-agent privacy preserving planning, which is appropriate for such problems. In single agent classical planning, landmarks are key to constructing strong heuristics for state space search. In this paper we propose a method for identifying landmarks in MA-STRIPS in a privacy preserving distributed setting. The agents collaborate to find sound landmarks without revealing their private actions or goals. In addition, we also propose a novel MA-STRIPS planner that uses these landmarks. We empirically show that our detected landmarks improve the performance of previous approaches, and that our new planner is faster than all existing planners for multi-agent problems.

AAAI Conference 2013 Conference Paper

Qualitative Planning under Partial Observability in Multi-Agent Domains

Ronen Brafman
Guy Shani
Shlomo Zilberstein

Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more “classical” in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.

AAAI Conference 2012 Conference Paper

A Multi-Path Compilation Approach to Contingent Planning

Ronen Brafman
Guy Shani

We describe a new sound and complete method for compiling contingent planning problems with sensing actions into classical planning. Our method encodes conditional plans within a linear, classical plan. This allows our planner, MPSR, to reason about multiple future outcomes of sensing actions, and makes it less susceptible to dead-ends. MPRS, however, generates very large classical planning problems. To overcome this, we use an incomplete variant of the method, based on state sampling, within an online replanner. On most current domains, MPSR finds plans faster, although its plans are often longer. But on a new challenging variant of Wumpus with dead-ends, it finds smaller plans, faster, and scales better.

JAAMAS Journal 2012 Journal Article

A survey of point-based POMDP solvers

Guy Shani
Joelle Pineau
Robert Kaplow

Abstract The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm—mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.

IJCAI Conference 2011 Conference Paper

Replanning in Domains with Partial Information and Sensing Actions

Guy Shani
Ronen I. Brafman

Replanning via determinization is a recent, popular approach for onlineplanning in MDPs. In this paper we adapt this idea to classical, non-stochastic domains with partial information and sensing actions. At eachstep we generate a candidate plan which solves a classical planning probleminduced by the original problem. We execute this plan as long as it is safeto do so. When this is no longer the case, we replan. The classical planning problem we generate is based on the T0 translation, in which the classical state captures the knowledge state of theagent. We overcome the non-determinism in sensing actions, and the large domain size introduced by T0 by using state sampling. Our planner also employs a novel, lazy, regression-based method for querying the belief state.

PDF Details DOI

AAMAS Conference 2010 Conference Paper

Augmenting Appearance-Based Localization and Navigation using Belief Update

George Chrysanthakopoulos
Guy Shani

Appearance-based localization compares the current image takenfrom a robot's camera to a set of pre-recorded images in order toestimate the current location of the robot. Such techniques oftenmaintain a graph of images, modeling the dynamics of the imagesequence. This graph is used to navigate in the space of images. In this paper we bring a set of techniques together, includingPartially-Observable Markov Decision Processes, hierarchical staterepresentations, visual homing, human-robot interactions, and soforth, into the appearance-based approach. Our approach providesa complete solution to the deployment of a robot in a relativelysmall environment, such as a house, or a work place, allowing therobot to robustly navigate the environment after minimal training. We demonstrate our approach in two environments using a realrobot, showing how after a short training session, the robot is ableto navigate well in the environment.

AAMAS Conference 2010 Conference Paper

High-level Reinforcement Learning in Strategy Games

Christopher Amato
Guy Shani

Video games provide a rich testbed for artificial intelligence methods. In particular, creating automated opponents that perform well in strategy games is a difficult task. For instance, human players rapidly discover and exploit the weaknesses of hard coded strategies. To build better strategies, we suggest a reinforcement learning approach for learning a policy that switches between high-level strategies. These strategies are chosen based on different game situations and a fixed opponent strategy. Our learning agents are able to rapidly adapt to fixed opponents and improve deficiencies in the hard coded strategies, as the results demonstrate.

IJCAI Conference 2009 Conference Paper

Scott Sanner
Robby Goetschalckx
Kurt Driessens
Guy Shani

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efﬁciency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.

IJCAI Conference 2009 Conference Paper

Jilles Steeve Dibangoye
Guy Shani
Brahim Chaib-draa
Abdel-Illah Mouaddib

Over the past few years, point-based POMDP solvers scaled up to produce approximate solutions to mid-sized domains. However, to solve real world problems, solvers must exploit the structure of the domain. In this paper we focus on the topological structure of the problem, where the state space contains layers of states. We present here the Topological Order Planner (TOP) that utilizes the topological structure of the domain to compute belief space trajectories. TOP rapidly produces trajectories focused on the solveable regions of the belief space, thus reducing the number of redundant backups considerably. We demonstrate TOP to produce good quality policies faster than any other pointbased algorithm on domains with sufﬁcient structure.

JMLR Journal 2009 Journal Article

A Survey of Accuracy Evaluation Metrics of Recommendation Tasks

Asela Gunawardana
Guy Shani

Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms offline using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of offline experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing offline experiments. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

NeurIPS Conference 2009 Conference Paper

Improving Existing Fault Recovery Policies

Guy Shani
Christopher Meek

Automated recovery from failures is a key component in the management of large data centers. Such systems typically employ a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we explain how to use data gathered from the interactions of the hand-made controller with the system, to create an optimized controller. We suggest learning an indefinite horizon Partially Observable Markov Decision Process, a model for decision making under uncertainty, and solve it using a point-based algorithm. We describe the complete process, starting with data gathering, model learning, model checking procedures, and computing a policy. While our paper focuses on a specific domain, our method is applicable to other systems that use a hand-coded, imperfect controllers.

ICAPS Conference 2008 Conference Paper

Efficient ADD Operations for Point-Based Algorithms

Guy Shani
Pascal Poupart
Ronen I. Brafman
Solomon Eyal Shimony

During the past few years, point-based POMDP solvers have gradually scaled up to handle medium sized domains through better selection of the set of points and efficient backup methods. Point-based research has focused on flat, explicit representation of the state space, yet in many realistic domains a factored representation is more appropriate. The latter have exponentially large state-spaces, and current methods are unlikely to handle models of reasonable size. Thus, adapting point-based methods to factored representations by modeling propositional state spaces better, e. g. by using Algebraic Decision Diagrams (ADDs) is needed. While a straightforward ADD-based implementation can effectively tackle large factored POMDPs, we propose several techniques to further improve scalability. In particular, we show how ADDs can be used successfully in factored domains that exhibit reasonable locality. Our algorithms are several orders of magnitude faster than current point-based algorithms used with flat representations.

IJCAI Conference 2007 Conference Paper

Guy Shani
Ronen I. Brafman
Solomon E. Shimony

Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution for medium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towards rewards, finding sequences of useful backups, and show how it scales up better than HSVI on larger benchmarks.

JMLR Journal 2005 Journal Article

An MDP-Based Recommender System

Guy Shani
David Heckerman
Ronen I. Brafman

Typical recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential optimization problem and, consequently, that Markov decision processes (MDPs) provide a more appropriate model for recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation and the expected value of each recommendation. To succeed in practice, an MDP-based recommender system must employ a strong initial model, must be solvable quickly, and should not consume too much memory. In this paper, we describe our particular MDP model, its initialization using a predictive model, the solution and update algorithm, and its actual performance on a commercial site. We also describe the particular predictive model we used which outperforms previous models. Our system is one of a small number of commercially deployed recommender systems. As far as we know, it is the first to report experimental analysis conducted on a real commercial site. These results validate the commercial value of recommender systems, and in particular, of our MDP-based approach. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )

NeurIPS Conference 2004 Conference Paper

Resolving Perceptual Aliasing In The Presence Of Noisy Sensors

Guy Shani
Ronen Brafman

Agents learning to act in a partially observable domain may need to overcome the problem of perceptual aliasing i. e. , different states that appear similar but require different responses. This problem is exacer- bated when the agent's sensors are noisy, i. e. , sensors may produce dif- ferent observations in the same state. We show that many well-known reinforcement learning methods designed to deal with perceptual alias- ing, such as Utile Suffix Memory, finite size history windows, eligibility traces, and memory bits, do not handle noisy sensors well. We suggest a new algorithm, Noisy Utile Suffix Memory (NUSM), based on USM, that uses a weighted classification of observed trajectories. We compare NUSM to the above methods and show it to be more robust to noise.

ICAPS Conference 2003 Conference Paper

Recommendation as a Stochastic Sequential Decision Problem

Ronen I. Brafman
David Heckerman
Guy Shani

Recommender systems -- systems that suggest to users in e-commerce sites items that might interest them -- adopt a static view of the recommendation process and treat it as a prediction problem. In an earlier paper, we argued that it is more appropriate to view the problem of generating recommendations as a sequential decision problem and, consequently, that Markov decision processes (MDPs) provide a more appropriate model for recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation, and they take into account the expected value of each recommendation. The use of MDPs in a commercial site raises three fundamental problems: providing an adequate initial model, updating this model online as new items (e. g., books) arrive, and coping with the enormous state-space of this model. In past work, we dealt with the first problem. In this paper we consider the second, and especially, the third problem, which is of greater concern to researchers in decision-theoretic planning. We show that although the model we consider has roughly 1011 states, we can quickly provide an approximate solution by utilizing its special structure. Our memory requirements -- a serious concern for commercial online applications -- are modest; and the overall resource requirements of our system are comparable to those of a well-known commercial recommender system that uses a simpler and less accurate model. Our system is one of a handful of deployed commercial recommender systems as well as one of a handful of MDP-based deployed systems. It has been running at www. mitos. co. il, a commercial online bookstore, since August, 2002.

UAI Conference 2002 Conference Paper

An MDP-based Recommender System

Guy Shani
Ronen I. Brafman
David Heckerman

Typical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential decision problem and, consequently, that Markov decision processes (MDP) provide a more appropriate model for Recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation, and they take into account the expected value of each recommendation. To succeed in practice, an MDP-based Recommender system must employ a strong initial model; and the bulk of this paper is concerned with the generation of such a model. In particular, we suggest the use of an n-gram predictive model for generating the initial MDP. Our n-gram model induces a Markov-chain model of user behavior whose predictive accuracy is greater than that of existing predictive models. We describe our predictive model in detail and evaluate its performance on real data. In addition, we show how the model can be used in an MDP-based Recommender system.