Arrow Research search

Author name cluster

Guni Sharon

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

TMLR Journal 2025 Journal Article

AEAP: A Reinforcement Learning Actor Ensemble Algorithm with Adaptive Pruning

  • Wei Zhang
  • Guni Sharon

Actor ensemble reinforcement learning methods have shown promising performance on dense-reward continuous control tasks. However, they exhibit three primary limitations: (1) diversity collapse when using a shared replay buffer, often necessitating carefully tuned regularization terms; (2) computational overhead from maintaining multiple actors; and (3) analytically intractable policy gradients when using stochastic policies in ensembles, requiring approximations that may compromise performance. To address this third limitation, we restrict the ensemble to deterministic policies and propose Actor Ensemble with Adaptive Pruning (AEAP), a multi-actor deterministic policy gradient algorithm that tackles the remaining limitations through a two-stage approach. First, to alleviate diversity collapse, AEAP employs dual-randomized actor selection that decorrelates exploration and learning by randomly choosing different actors for both environment interaction and policy update. This approach also removes reliance on explicit regularization. Second, when convergence to homogeneous policies still occurs over time, computational efficiency is further achieved through adaptive dual-criterion pruning, which progressively removes underperforming or redundant actors based on critic-estimated value and action-space similarity. Although AEAP introduces four additional hyperparameters compared to TD3 (a baseline single-actor deterministic policy gradient algorithm), we provide two domain-agnostic parameter configurations that perform robustly across environments without requiring tuning. AEAP achieves superior or competitive asymptotic performance compared to baselines across six dense-reward MuJoCo tasks. On sparse-reward Fetch benchmarks, AEAP outperforms deterministic policy gradient methods but falls short of SAC (a baseline stochastic policy gradient algorithm) on one of three tasks. When compared to fixed-size multi-actor baselines, AEAP reduces wall-clock time without sacrificing performance, establishing it as an efficient and reliable actor ensemble variant.

TMLR Journal 2025 Journal Article

Policy-Guided Search on Tree-of-Thoughts for Efficient Problem Solving with Bounded Language Model Queries

  • Sumedh Pendurkar
  • Guni Sharon

Recent studies explored integrating state-space search algorithms with Language Models (LM) to perform look-ahead on the token generation process, the ``Tree-of-Thoughts'' (ToT), generated by LMs, thereby improving performance on problem-solving tasks. However, the affiliated search algorithms often overlook the significant computational costs associated with LM inference, particularly in scenarios with constrained computational budgets. Consequently, we address the problem of improving LM performance on problem-solving tasks under limited computational budgets. We demonstrate how the probabilities assigned to thoughts by LMs can serve as a heuristic to guide search within the ToT framework, thereby reducing the number of thought evaluations. Building on this insight, we adapt a heuristic search algorithm, Levin Tree Search (LTS), to the ToT framework, which leverages LMs as policies to guide the tree exploration efficiently. We extend the theoretical results of LTS by showing that, for ToT (a pruned tree), LTS guarantees a bound on the number of states expanded, and consequently, on the number of thoughts generated. Additionally, we analyze the sensitivity of this bound to the temperature values commonly used in the final softmax layer of the LM. Empirical evaluation under a fixed LM query budget demonstrates that LTS consistently achieves comparable or higher accuracy than baseline search algorithms within the ToT framework, across three domains (Blocksworld, PrOntoQA, Array Sorting) and four distinct LMs. These findings highlight the efficacy of LTS on ToT, particularly in enabling cost-effective and time-efficient problem-solving, making it well-suited for latency-critical and resource-constrained applications.

TMLR Journal 2024 Journal Article

Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors

  • Sheelabhadra Dey
  • Guni Sharon

Gaussian Mixture Models (GMMs) have been recently proposed for approximating actors in actor-critic reinforcement learning algorithms. Such GMM-based actors are commonly optimized using stochastic policy gradients along with an entropy maximization objective. In contrast to previous work, we define and study deterministic policy gradients for optimizing GMM-based actors. Similar to stochastic gradient approaches, our proposed method, denoted $\textit{Gaussian Mixture Deterministic Policy Gradient}$ (Gamid-PG), encourages policy entropy maximization. To this end, we define the GMM entropy gradient using $\textit{Variational Approximation}$ of the $KL$-divergence between the GMM's constituting Gaussians. We compare Gamid-PG with common stochastic policy gradient methods on benchmark dense-reward MuJoCo tasks and sparse-reward Fetch tasks. We observe that Gamid-PG outperforms stochastic gradient-based methods in 3/6 MuJoCo tasks while performing similarly on the remaining 3 tasks. In the Fetch tasks, Gamid-PG outperforms single-actor deterministic gradient-based methods while performing worse than stochastic policy gradient methods. Consequently, we conclude that GMMs optimized using deterministic policy gradients (1) should be favorably considered over stochastic gradients in dense-reward continuous control tasks, and (2) improve upon single-actor deterministic gradients.

AAMAS Conference 2024 Conference Paper

Continual Optimistic Initialization for Value-Based Reinforcement Learning

  • Sheelabhadra Dey
  • James Ault
  • Guni Sharon

Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore “infinitely-often”. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks.

SoCS Conference 2024 Conference Paper

Curriculum Generation for Learning Guiding Functions in State-Space Search Algorithms

  • Sumedh Pendurkar
  • Levi H. S. Lelis
  • Nathan R. Sturtevant
  • Guni Sharon

This paper investigates methods for training parameterized functions for guiding state-space search algorithms. Existing work commonly generates data for training such guiding functions by solving problem instances while leveraging the current version of the guiding function. As a result, as training progresses, the guided search algorithm can solve more difficult instances that are, in turn, used to further train the guiding function. These methods assume that a set of problem instances of varied difficulty is provided. Since previous work was not designed to distinguish the instances that the search algorithm can solve from those that cannot be solved with the current guiding function, the algorithm commonly wastes time attempting and failing to solve many of these instances. In this paper, we improve upon these training methods by generating a curriculum for learning the guiding function that directly addresses this issue. Namely, we propose and evaluate a Teacher-Student Curriculum (TSC) approach where the teacher is an evolutionary strategy that attempts to generate problem instances of ``correct difficulty

AAMAS Conference 2023 Conference Paper

Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games

  • Sumedh Pendurkar
  • Chris Chow
  • Luo Jie
  • Guni Sharon

We address a mechanism design problem where the goal of the designer is to maximize the entropy of a player’s mixed strategy at a Nash equilibrium. This objective is of special relevance to video games where game designers wish to diversify the players’ interaction with the game. To solve this design problem, we propose a bi-level alternating optimization technique that (1) approximates the mixed strategy Nash equilibrium using a Nash Monte-Carlo reinforcement learning approach and (2) applies a gradient-free optimization technique (Covariance-Matrix Adaptation Evolutionary Strategy) to maximize the entropy of the mixed strategy obtained in level (1). The experimental results show that our approach achieves comparable results to the state-of-the-art approach on three benchmark domains “Rock-Paper-Scissors-Fire-Water”, “Workshop Warfare” and “Pokemon Video Game Championship”. Next, we show that, unlike previous state-of-the-art approaches, the computational complexity of our proposed approach scales significantly better in larger combinatorial strategy spaces.

AAAI Conference 2023 Conference Paper

Socially Optimal Non-discriminatory Restrictions for Continuous-Action Games

  • Michael Oesterle
  • Guni Sharon

We address the following mechanism design problem: Given a multi-player Normal-Form Game (NFG) with a continuous action space, find a non-discriminatory (i.e., identical for all players) restriction of the action space which maximizes the resulting Nash Equilibrium with respect to a fixed social utility function. First, we propose a formal model of a Restricted Game and the corresponding restriction optimization problem. We then present an algorithm to find optimal non-discriminatory restrictions under some assumptions. Our experimental results with Braess' Paradox and the Cournot Game show that this method leads to an optimized social utility of the Nash Equilibria, even when the assumptions are not guaranteed to hold. Finally, we outline a generalization of our approach to the much wider scope of Stochastic Games.

ICAPS Conference 2023 Conference Paper

Task Phasing: Automated Curriculum Learning from Demonstrations

  • Vaibhav Bajaj
  • Guni Sharon
  • Peter Stone 0001

Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task-phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task-phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance.

TMLR Journal 2023 Journal Article

The (Un)Scalability of Informed Heuristic Function Estimation in NP-Hard Search Problems

  • Sumedh Pendurkar
  • Taoan Huang
  • Brendan Juba
  • Jiapeng Zhang
  • Sven Koenig
  • Guni Sharon

The A* algorithm is commonly used to solve NP-hard combinatorial optimization problems. When provided with a completely informed heuristic function, A* can solve such problems in time complexity that is polynomial in the solution cost and branching factor. In light of this fact, we examine a line of recent publications that propose fitting deep neural networks to the completely informed heuristic function. We assert that these works suffer from inherent scalability limitations since --- under the assumption of NP $\not \subseteq$ P/poly --- such approaches result in either (a) network sizes that scale super-polynomially in the instance sizes or (b) the accuracy of the fitted deep neural networks scales inversely with the instance sizes. Complementing our theoretical claims, we provide experimental results for three representative NP-hard search problems. The results suggest that fitting deep neural networks to informed heuristic functions requires network sizes that grow quickly with the problem instance size. We conclude by suggesting that the research community should focus on scalable methods for integrating heuristic search with machine learning, as opposed to methods relying on informed heuristic estimation.

SoCS Conference 2022 Conference Paper

A Discussion on the Scalability of Heuristic Approximators (Extended Abstract)

  • Sumedh Pendurkar
  • Taoan Huang
  • Sven Koenig
  • Guni Sharon

In this work, we examine a line of recent publications that propose to use deep neural networks to approximate the goal distances of states for heuristic search. We present a first step toward showing that this work suffers from inherent scalability limitations since --- under the assumption that P≠NP --- such approaches require network sizes that scale exponentially in the number of states to achieve the necessary (high) approximation accuracy.

IROS Conference 2021 Conference Paper

A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

  • Sheelabhadra Dey
  • Sumedh Pendurkar
  • Guni Sharon
  • Josiah P. Hanna

In various control task domains, existing controllers provide a baseline level of performance that—though possibly suboptimal—should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret with respect to a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind (a) training while leveraging demonstrations from the baseline policy to minimize regret with respect to the baseline policy, and (b) eventually surpassing the baseline performance. JIRL addresses these objectives by initially learning to imitate the baseline policy and gradually shifting control from the baseline to an RL agent. Experimental results show that JIRL effectively accomplishes the aforementioned objectives in several, continuous action-space domains. The results demonstrate that JIRL is comparable to a state-of-the-art algorithm in its final performance while incurring significantly lower baseline regret during training. Moreover, the results show a reduction factor of up to 21 in baseline regret over a trust-region-based approach that guarantees monotonic policy improvement.

JAIR Journal 2021 Journal Article

Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies

  • Roberto Capobianco
  • Varun Kompella
  • James Ault
  • Guni Sharon
  • Stacy Jong
  • Spencer Fox
  • Lauren Meyers
  • Peter R. Wurman

The year 2020 saw the covid-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world have been faced with the challenge of protecting public health while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, even the most data-driven intervention policies rely on heuristics. In this paper, we study how reinforcement learning (RL) and Bayesian inference can be used to optimize mitigation policies that minimize economic impact without overwhelming hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; (2) an RLbased methodology for optimizing fine-grained mitigation policies within this simulator; and (3) a Hidden Markov Model for predicting infected individuals based on partial observations regarding test results, presence of symptoms, and past physical contacts. This article is part of the special track on AI and COVID-19.

IJCAI Conference 2021 Conference Paper

Alleviating Road Traffic Congestion with Artificial Intelligence

  • Guni Sharon

This paper reviews current AI solutions towards road traffic congestion alleviation. Three specific AI technologies are discussed, (1) intersection management protocols for coordinating vehicles through a roads intersection in a safe and efficient manner, (2) road pricing protocol that induce optimized traffic flow, and (3) partial or full autonomous driving that can stabilize traffic flow and mitigate adverse traffic shock waves. The paper briefly presents the challenges affiliated with each of these applications along with an overview of state-of-the-art solutions. Finally, real-world implementation gaps and challenges are discussed.

AAMAS Conference 2021 Conference Paper

Multiagent Epidemiologic Inference through Realtime Contact Tracing

  • Guni Sharon
  • James Ault
  • Peter Stone
  • Varun Kompella
  • Roberto Capobianco

This paper addresses an epidemiologic inference problem where, given realtime observation of test results, presence of symptoms, and physical contacts, the most likely infected individuals need to be inferred. The inference problem is modeled as a hidden Markov model where infection probabilities are updated at every time step and evolve between time steps. We suggest a unique inference approach that avoids storing the given observations explicitly. Theoretical justification for the proposed model is provided under specific simplifying assumptions. To complement these theoretical results, a comprehensive experimental study is performed using a custom-built agent-based simulator that models inter-agent contacts. The reported results show the effectiveness of the proposed inference model when considering more realistic scenarios – where the simplifying assumptions do not hold. When pairing the proposed inference model with a simple testing and quarantine policy, promising trends are obtained where the epidemic progression is significantly slowed down while quarantining a bounded number of individuals.

NeurIPS Conference 2021 Conference Paper

Reinforcement Learning Benchmarks for Traffic Signal Control

  • James Ault
  • Guni Sharon

We propose a toolkit for developing and comparing reinforcement learning (RL)-based traffic signal controllers. The toolkit includes implementation of state-of-the-art deep-RL algorithms for signal control along with benchmark control problems that are based on realistic traffic scenarios. Importantly, the toolkit allows a first-of-its-kind comparison between state-of-the-art RL-based signal controllers while providing benchmarks for future comparisons. Consequently, we compare and report the relative performance of current RL algorithms. The experimental results suggest that previous algorithms are not robust to varying sensing assumptions and non-stylized intersection layouts. When more realistic signal layouts and advanced sensing capabilities are assumed, a distributed deep-Q learning approach is shown to outperform previously reported state-of-the-art algorithms in many cases.

AAMAS Conference 2019 Conference Paper

Marginal Cost Pricing with a Fixed Error Factor in Traffic Networks

  • Guni Sharon
  • Stephen D. Boyles
  • Shani Alkoby
  • Peter Stone

It is well known that charging marginal cost tolls (MCT) from self interested agents participating in a congestion game leads to optimal system performance, i. e. , minimal total latency. However, it is not generally possible to calculate the correct marginal costs tolls precisely, and it is not known what the impact is of charging incorrect tolls. This uncertainty could lead to reluctance to adopt such schemes in practice. This paper studies the impact of charging MCT with some fixed factor error on the system’s performance. We prove that under-estimating MCT results in a system performance that is at least as good as that obtained by not applying tolls at all. This result might encourage adoption of MCT schemes with conservative MCT estimations. Furthermore, we prove that no local extrema can exist in the function mapping the error value, r, to the system’s performance, T(r). This result implies that accurately calibrating MCT for a given network can be done by identifying an extremum inT(r) which, consequently, must be the global optimum. Experimental results from simulating several large-scale, real-life traffic networks are presented and provide further support for our theoretical findings.

AAAI Conference 2019 Conference Paper

Selecting Compliant Agents for Opt-in Micro-Tolling

  • Josiah P. Hanna
  • Guni Sharon
  • Stephen D. Boyles
  • Peter Stone

This paper examines the impact of tolls on social welfare in the context of a transportation network in which only a portion of the agents are subject to tolls. More specifically, this paper addresses the question: which subset of agents provides the most system benefit if they are compliant with an approximate marginal cost tolling scheme? Since previous work suggests this problem is NP-hard, we examine a heuristic approach. Our experimental results on three real-world traffic scenarios suggest that evaluating the marginal impact of a given agent serves as a particularly strong heuristic for selecting an agent to be compliant. Results from using this heuristic for selecting 7. 6% of the agents to be compliant achieved an increase of up to 10. 9% in social welfare over not tolling at all. The presented heuristic approach and conclusions can help practitioners target specific agents to participate in an opt-in tolling scheme.

AAAI Conference 2018 Conference Paper

DyETC: Dynamic Electronic Toll Collection for Traffic Congestion Alleviation

  • Haipeng Chen
  • Bo An
  • Guni Sharon
  • Josiah Hanna
  • Peter Stone
  • Chunyan Miao
  • Yeng Soh

To alleviate traffic congestion in urban areas, electronic toll collection (ETC) systems are deployed all over the world. Despite the merits, tolls are usually pre-determined and fixed from day to day, which fail to consider traffic dynamics and thus have limited regulation effect when traffic conditions are abnormal. In this paper, we propose a novel dynamic ETC (DyETC) scheme which adjusts tolls to traffic conditions in realtime. The DyETC problem is formulated as a Markov decision process (MDP), the solution of which is very challenging due to its 1) multi-dimensional state space, 2) multidimensional, continuous and bounded action space, and 3) time-dependent state and action values. Due to the complexity of the formulated MDP, existing methods cannot be applied to our problem. Therefore, we develop a novel algorithm, PG-β, which makes three improvements to traditional policy gradient method by proposing 1) time-dependent value and policy functions, 2) Beta distribution policy function and 3) state abstraction. Experimental results show that, compared with existing ETC schemes, DyETC increases traffic volume by around 8%, and reduces travel time by around 14. 6% during rush hour. Considering the total traffic volume in a traffic network, this contributes to a substantial increase to social welfare.

AAMAS Conference 2018 Conference Paper

Link-based Parameterized Micro-tolling Scheme for Optimal Traffic Management

  • Hamid Mirzaei
  • Guni Sharon
  • Stephen Boyles
  • Tony Givargis
  • Peter Stone

In the micro-tolling paradigm, different toll values are assigned to different links within a congestible traffic network. Self-interested agents then select minimal cost routes, where cost is a function of the travel time and tolls paid. A centralized system manager sets toll values with the objective of inducing a user equilibrium that maximizes the total utility over all agents. A recently proposed algorithm for computing such tolls, denoted ∆-tolling, was shown to yield up to 32% reduction in total travel time in simulated traffic scenarios compared to when there are no tolls. ∆-tolling includes two global parameters: β which is a proportionality parameter, and R which influences the rate of change of toll values across all links. This paper introduces a generalization of ∆-tolling which accounts for different β and R values on each link in the network. While this enhanced ∆-tolling algorithm requires setting significantly more parameters, we show that they can be tuned effectively via policy gradient reinforcement learning. Experimental results from several traffic scenarios indicate that Enhanced ∆-tolling reduces total travel time by up to 28% compared to the original ∆-tolling algorithm, and by up to 45% compared to not tolling.

AAAI Conference 2018 Conference Paper

Traffic Optimization for a Mixture of Self-Interested and Compliant Agents

  • Guni Sharon
  • Michael Albert
  • Tarun Rambha
  • Stephen Boyles
  • Peter Stone

This paper focuses on two commonly used path assignment policies for agents traversing a congested network: selfinterested routing, and system-optimum routing. In the selfinterested routing policy each agent selects a path that optimizes its own utility, while the system-optimum routing agents are assigned paths with the goal of maximizing system performance. This paper considers a scenario where a centralized network manager wishes to optimize utilities over all agents, i. e. , implement a system-optimum routing policy. In many real-life scenarios, however, the system manager is unable to influence the route assignment of all agents due to limited influence on route choice decisions. Motivated by such scenarios, a computationally tractable method is presented that computes the minimal amount of agents that the system manager needs to influence (compliant agents) in order to achieve system optimal performance. Moreover, this methodology can also determine whether a given set of compliant agents is sufficient to achieve system optimum and compute the optimal route assignment for the compliant agents to do so. Experimental results are presented showing that in several large-scale, realistic traffic networks optimal flow can be achieved with as low as 13% of the agent being compliant and up to 54%.

AAMAS Conference 2017 Conference Paper

Multirobot Symbolic Planning under Temporal Uncertainty

  • Shiqi Zhang
  • Yuqian Jiang
  • Guni Sharon
  • Peter Stone

Multirobot symbolic planning (MSP) aims at computing plans, each in the form of a sequence of actions, for a team of robots to achieve their individual goals while minimizing overall cost. Solving MSP problems requires modeling limited domain resources (e. g. , corridors that allow at most one robot at a time) and the possibility of action synergy (e. g. , multiple robots going through a door after a single door-opening action). However, the temporal uncertainty that propagates over actions, such as delays caused by obstacles in navigation actions, makes it challenging to plan for resource sharing and realizing synergy in a team of robots. This paper, for the first time, introduces the problem of MSP under temporal uncertainty (MSPTU). We present a novel, iterative inter-dependent planning (IIDP) algorithm, including two configurations (simple and enhanced), for solving general MSPTU problems. We then focus on multirobot navigation tasks, presenting a full instantiation of IIDP that includes a new algorithm for computing conditional plan cost under temporal uncertainty and a novel shifted-Poisson distribution for accumulating temporal uncertainty over actions. The algorithms have been implemented both in simulation and on real robots. We observed a significant reduction in overall cost compared to baselines in which robots do not communicate or model temporal uncertainty. CCS Concepts •Computing methodologies → Robotic planning; Multi-agent planning; Planning under uncertainty;

AAMAS Conference 2017 Conference Paper

Real-time Adaptive Tolling Scheme for Optimized Social Welfare in Traffic Networks

  • Guni Sharon
  • Josiah P. Hanna
  • Tarun Rambha
  • Michael W. Levin
  • Michael Albert
  • Stephen D. Boyles
  • Peter Stone

Connected and autonomous vehicle technology has advanced rapidly in recent years. These technologies create possibilities for advanced AI-based traffic management techniques. Developing such techniques is an important challenge and opportunity for the AI community as it requires synergy between experts in game theory, multiagent systems, behavioral science, and flow optimization. This paper takes a step in this direction by considering traffic flow optimization through setting and broadcasting of dynamic and adaptive tolls. Previous tolling schemes either were not adaptive in realtime, not scalable to large networks, or did not optimize traffic flow over an entire network. Moreover, previous schemes made strong assumptions on observable demands, road capacities and users homogeneity. This paper introduces ∆-tolling, a novel tolling scheme that is adaptive in real-time and able to scale to large networks. We provide theoretical evidence showing that under certain assumptions ∆-tolling is equal to Marginal-Cost Tolling, which provably leads to system-optimal, and empirical evidence showing that ∆-tolling increases social welfare (by up to 33%) in two traffic simulators with markedly different modeling assumptions. CCS Concepts •Computing methodologies → Multi-agent planning;

SoCS Conference 2017 Conference Paper

Search-Based Optimal Solvers for the Multi-Agent Pathfinding Problem: Summary and Challenges

  • Ariel Felner
  • Roni Stern
  • Solomon Eyal Shimony
  • Eli Boyarski
  • Meir Goldenberg
  • Guni Sharon
  • Nathan R. Sturtevant
  • Glenn Wagner

Multi-agent pathfinding (MAPF) is an area of expanding research interest. At the core of this research area, numerous diverse search-based techniques were developed in the past 6 years for optimally solving MAPF under the sum-of-costs objective function. In this paper we survey these techniques, while placing them into the wider context of the MAPF field of research. Finally, we provide analytical and experimental comparisons that show that no algorithm dominates all others in all circumstances. We conclude by listing important future research directions.

AAAI Conference 2016 Conference Paper

Bidirectional Search That Is Guaranteed to Meet in the Middle

  • Robert Holte
  • Ariel Felner
  • Guni Sharon
  • Nathan Sturtevant

We present MM, the first bidirectional heuristic search algorithm whose forward and backward searches are guaranteed to “meet in the middle”, i. e. never expand a node beyond the solution midpoint. We also present a novel framework for comparing MM, A*, and brute-force search, and identify conditions favoring each algorithm. Finally, we present experimental results that support our theoretical analysis.

SoCS Conference 2016 Conference Paper

Extended Abstract: An Improved Priority Function for Bidirectional Heuristic Search

  • Guni Sharon
  • Robert C. Holte
  • Ariel Felner
  • Nathan R. Sturtevant

Bidirectional search algorithms interleave a search forward from the start state (start ) and a search backward (i. e. using reverse operators) from the goal state (goal). We say that the two searches “meet in the middle” if neither search expands a node whose g-value (in the given direction) exceeds C*/2, where C* is the cost of an optimal solution. The only bidirectional heuristic search algorithm that is guaranteed to meet in the middle under all circumstances is the recently introduced MM algorithm (Holte et al. 2016). The feature of MM that provides this guarantee is its unique priority functions for nodes on its open lists. In this short note we present MMe, which enhances MM’s priority function and is expected to expand fewer nodes than MM under most circumstances. We sketch a proof of MMe’s correctness, describe conditions under which MMe will expand fewer nodes than MM and vice versa, and experimentally compare MMe and MM on the 10-Pancake problem.

AAAI Conference 2016 Conference Paper

Multi-Agent Path Finding with Payload Transfers and the Package-Exchange Robot-Routing Problem

  • Hang Ma
  • Craig Tovey
  • Guni Sharon
  • T. K. Kumar
  • Sven Koenig

We study transportation problems where robots have to deliver packages and can transfer the packages among each other. Specifically, we study the package-exchange robotrouting problem (PERR), where each robot carries one package, any two robots in adjacent locations can exchange their packages, and each package needs to be delivered to a given destination. We prove that exchange operations make all PERR instances solvable. Yet, we also show that PERR is NP-hard to approximate within any factor less than 4/3 for makespan minimization and is NP-hard to solve for flowtime minimization, even when there are only two types of packages. Our proof techniques also generate new insights into other transportation problems, for example, into the hardness of approximating optimal solutions to the standard multiagent path-finding problem (MAPF). Finally, we present optimal and suboptimal PERR solvers that are inspired by MAPF solvers, namely a flow-based ILP formulation and an adaptation of conflict-based search. Our empirical results demonstrate that these solvers scale well and that PERR instances often have smaller makespans and flowtimes than the corresponding MAPF instances.

ICAPS Conference 2015 Conference Paper

Don't Split, Try To Work It Out: Bypassing Conflicts in Multi-Agent Pathfinding

  • Eli Boyarski
  • Ariel Felner
  • Guni Sharon
  • Roni Stern

Conflict-Based Search (CBS) is a recently introduced algorithm for Multi-AgentPath Finding (MAPF) whose runtime is exponential in the number of conflictsfound between the agents' paths. We present an improved version of CBS thatbypasses conflicts thereby reducing the CBS search tree. Experimental resultsshow that this improvement reduces the runtime by an order of magnitude in manycases.

IJCAI Conference 2015 Conference Paper

ICBS: Improved Conflict-Based Search Algorithm for Multi-Agent Pathfinding

  • Eli Boyarski
  • Ariel Felner
  • Roni Stern
  • Guni Sharon
  • David Tolpin
  • Oded Betzalel
  • Eyal Shimony

Conflict-Based Search (CBS) and its enhancements, Meta-Agent CBS and bypassing conflicts are amongst the strongest newly introduced algorithms for Multi-Agent Path Finding. This paper introduces two new improvements to CBS and incorporates them into a coherent, improved version of CBS, namely ICBS. Experimental results show that each of these improvements further reduces the runtime over the existing CBS-based approaches. When all improvements are combined, an even larger improvement is achieved, producing state-of-the art results for a number of domains.

SoCS Conference 2015 Conference Paper

ICBS: The Improved Conflict-Based Search Algorithm for Multi-Agent Pathfinding

  • Eli Boyarski
  • Ariel Felner
  • Roni Stern
  • Guni Sharon
  • Oded Betzalel
  • David Tolpin
  • Solomon Eyal Shimony

Conflict-Based Search (CBS) and its generalization, Meta-Agent CBS are amongst the strongest newly introduced algorithms for Multi-Agent Path Finding. This paper introduces ICBS, an improved version of CBS. ICBS incorporates three orthogonal improvements to CBS which are systematically described and studied. Experimental results show that each of these improvements reduces the runtime over basic CBS by up to 20x in many cases. When all three improvements are combined, an even larger improvement is achieved, producing state-ofthe art results for a number of domains.

AAAI Conference 2015 Conference Paper

Multi-Agent Pathfinding as a Combinatorial Auction

  • Ofra Amir
  • Guni Sharon
  • Roni Stern

This paper proposes a mapping between multi-agent pathfinding (MAPF) and combinatorial auctions (CAs). In MAPF, agents need to reach their goal destinations without colliding. Algorithms for solving MAPF aim at assigning agents non-conflicting paths that minimize agents’ travel costs. In CA problems, agents bid over bundles of items they desire. Auction mechanisms aim at finding an allocation of bundles that maximizes social welfare. In the proposed mapping of MAPF to CAs, agents bid on paths to their goals and the auction allocates non-colliding paths to the agents. Using this formulation, auction mechanisms can be naturally used to solve a range of MAPF problem variants. In particular, auction mechanisms can be applied to non-cooperative settings with self-interested agents while providing optimality guarantees and robustness to manipulations by agents. The paper further shows how to efficiently implement an auction mechanism for MAPF, utilizing methods and representations from both the MAPF and CA literatures.

AAAI Conference 2015 Conference Paper

Optimal Multi-Agent Pathfinding Algorithms

  • Guni Sharon

The multi-agent path finding (MAPF) problem is a generalization of the single-agent path finding problem for k > 1 agents. It consists of a graph and a number of agents. For each agent, a unique start state and a unique goal state are given, and the task is to find paths for all agents from their start states to their goal states, under the constraint that agents cannot collide during their movements. In many cases there is an additional goal of minimizing a cumulative cost function such as the sum of the time steps required for every agent to reach its goal. The goal of my research is providing new methods to solve MAPF optimally and provide theoretical understandings that will help choose the best solver given a problem instance.

SoCS Conference 2015 Conference Paper

Partial Domain Search Tree for Constraint-Satisfaction Problems

  • Guni Sharon

The traditional approach for solving Constraint satisfaction Problems (CSPs) is searching the Assignment Space in which each state represents an assignment to some variables. This paper suggests a new search space formalization for CSPs, the Partial Domain Search Tree (PDST). In each PDST node aunique subset of the original domain is considered, values are excluded from the domains in each node to insure that a given set of constraints is satisfied. We provide theoretical analysis of this new approach showing that searching the PDST is beneficial for loosely constrained problems. Experimental results show that this new formalization is a promising direction for future research. In some cases searching the PDST outperforms the traditional approach by an order of magnitude. Furthermore, PDST can enhance Local Search techniques resulting in solutions that violate up to 30% less constraints.

SoCS Conference 2014 Conference Paper

Exponential Deepening A* for Real-Time Agent-Centered Search

  • Guni Sharon
  • Ariel Felner
  • Nathan R. Sturtevant

This paper introduces Exponential Deepening A* (EDA*), an Iterative Deepening (ID) algorithm where the threshold between successive Depth-First calls is increased exponentially. EDA* can be viewed as a Real-Time Agent-Centered (RTACS) algorithm. Unlike most existing RTACS algorithms, EDA* is proven to hold a worst case bound that is linear in the state space. Experimental results demonstrate up to 5x reduction over existing RTACS solvers wrt distance traveled, states expanded and CPU runtime. A full version of this paper appears in AAAI-14.

AAAI Conference 2014 Conference Paper

Exponential Deepening A* for Real-Time Agent-Centered Search

  • Guni Sharon
  • Ariel Felner
  • Nathan Sturtevant

In the Real-Time Agent-Centered Search (RTACS) problem, an agent has to arrive at a goal location while acting and reasoning in the physical world. Traditionally, RTACS problems are solved by propagating and updating heuristic values of states visited by the agent. In existing RTACS algorithms the agent may revisit each state many times causing the entire procedure to be quadratic in the state space. We study the Iterative Deepening (ID) approach for solving RTACS and introduce Exponential Deepening A* (EDA*), an RTACS algorithm where the threshold between successive Depth-First calls is increased exponentially. EDA* is proven to hold a worst case bound that is linear in the state space. Experimental results supporting this bound are presented and demonstrate up to 10x reduction over existing RTACS solvers wrt distance traveled, states expanded and CPU runtime.

SoCS Conference 2014 Conference Paper

Suboptimal Variants of the Conflict-Based Search Algorithm for the Multi-Agent Pathfinding Problem

  • Max Barer
  • Guni Sharon
  • Roni Stern
  • Ariel Felner

The task in the multi-agent path finding problem (MAPF) is to find paths for multiple agents, each with a different start and goal position, such that agents do not collide. A successful optimal MAPF solver is the conflict-based search (CBS) algorithm. CBS is a two level algorithm where special conditions ensure it returns the optimal solution. Solving MAPF optimally is proven to be NP-hard, hence CBS and all other optimal solvers do not scale up. We propose several ways to relax the optimality conditions of CBS trading solution quality for runtime as well as bounded-suboptimal variants, where the returned solution is guaranteed to be within a constant factor from optimal solution cost. Experimental results show the benefits of our new approach; a massive reduction in running time is presented while sacrificing a minor loss in solution quality. Our new algorithms outperform other existing algorithms in most of the cases.

SoCS Conference 2013 Conference Paper

Online Detection of Dead States in Real-Time Agent-Centered Search

  • Guni Sharon
  • Nathan R. Sturtevant
  • Ariel Felner

In this paper we introduce techniques for state pruning atruntime in a priori unknown domains. We describe how toidentify states that can be deleted from the state-space whenlooking for both optimal and suboptimal solutions. We discussgeneral graphs and special cases like 8-connected grids. Experimental results show a speed up of up to an order ofmagnitude when applying our techniques on real-time agentcenteredsearch problems.

AAAI Conference 2012 Conference Paper

Conflict-Based Search For Optimal Multi-Agent Path Finding

  • Guni Sharon
  • Roni Stern
  • Ariel Felner
  • Nathan Sturtevant

In the multi agent path finding problem (MAPF) paths should be found for several agents, each with a different start and goal position such that agents do not collide. Previous optimal solvers applied global A*-based searches. We present a new search algorithm called Conflict Based Search (CBS). CBS is a two-level algorithm. At the high level, a search is performed on a tree based on conflicts between agents. At the low level, a search is performed only for a single agent at a time. In many cases this reformulation enables CBS to examine fewer states than A* while still maintaining optimality. We analyze CBS and show its benefits and drawbacks. Experimental results on various problems shows a speedup of up to a full order of magnitude over previous approaches.

SoCS Conference 2012 Conference Paper

Conflict-Based Search for Optimal Multi-Agent Path Finding

  • Guni Sharon
  • Roni Stern
  • Ariel Felner
  • Nathan R. Sturtevant

We present a new two-level search algorithm for optimal multi-agent path finding called Conflict Based Search (CBS). At the high level, a search is performed on a tree based on conflicts between agents. At the low level, a search is performed only for a single agent at a time. Experimental results on various problems shows a speedup of up to a full order of magnitude over previous approaches.

SoCS Conference 2012 Conference Paper

Meta-Agent Conflict-Based Search For Optimal Multi-Agent Path Finding

  • Guni Sharon
  • Roni Stern
  • Ariel Felner
  • Nathan R. Sturtevant

The task in the multi-agent path finding problem (MAPF) isto find paths for multiple agents, each with a different startand goal position, such that agents do not collide. It is possibleto solve this problem optimally with algorithms that arebased on the A* algorithm. Recently, we proposed an alternativealgorithm called Conflict-Based Search (CBS) (Sharonet al. 2012), which was shown to outperform the A*-basedalgorithms in some cases. CBS is a two-level algorithm. Atthe high level, a search is performed on a tree based on conflictsbetween agents. At the low level, a search is performedonly for a single agent at a time. While in some cases CBSis very efficient, in other cases it is worse than A*-based algorithms. This paper focuses on the latter case by generalizingCBS to Meta-Agent CBS (MA-CBS). The main idea isto couple groups of agents into meta-agents if the number ofinternal conflicts between them exceeds a given bound. MACBSacts as a framework that can run on top of any completeMAPF solver. We analyze our new approach and provideexperimental results demonstrating that it outperforms basicCBS and other A*-based optimal solvers in many cases.

SoCS Conference 2012 Conference Paper

Partial-Expansion A* with Selective Node Generation

  • Ariel Felner
  • Meir Goldenberg
  • Guni Sharon
  • Roni Stern
  • Tal Beja
  • Nathan R. Sturtevant
  • Robert C. Holte
  • Jonathan Schaeffer 0001

A* is often described as being 'optimal, ' in that it expands the minimum number of unique nodes. But, A* may generate many extra nodes which are never expanded. This is a performance loss, especially when the branching factor is large. Partial Expansion A* (PEA*) addresses this problem when expanding a node, n, by generating all the children of n but only storing children with the same f-cost as n. We introduce an enhanced version of PEA* (EPEA*). Given a priori domain knowledge, EPEA* only generates the children with the same f-cost as the parent. State-of-the-art results were obtained for a number of domains. Drawbacks of EPEA* are also discussed. A full version of this paper appears in the proceedings of AAAI-2012

AAAI Conference 2012 Conference Paper

Partial-Expansion A* with Selective Node Generation

  • Ariel Felner
  • Meir Goldenberg
  • Guni Sharon
  • Roni Stern
  • Tal Beja
  • Nathan Sturtevant
  • Jonathan Schaeffer
  • Robert Holte

A* is often described as being ‘optimal’, in that it expands the minimum number of unique nodes. But, A* may generate many extra nodes which are never expanded. This is a performance loss, especially when the branching factor is large. Partial Expansion A* (PEA*) (Yoshizumi, Miura, and Ishida 2000) addresses this problem when expanding a node, n, by generating all the children of n but only storing children with the same f-cost as n. n is re-inserted into the OPEN list, but with the f-cost of the next best child. This paper introduces an enhanced version of PEA* (EPEA*). Given a priori domain knowledge, EPEA* generates only the children with the same f-cost as the parent. EPEA* is generalized to its iterative-deepening variant, EPE-IDA*. For some domains, these algorithms yield substantial performance improvements. State-of-the-art results were obtained for the pancake puzzle and for some multi-agent pathfinding instances. Drawbacks of EPEA* are also discussed.

SoCS Conference 2011 Conference Paper

Pruning Techniques for the Increasing Cost Tree Search for Optimal Multi-agent Pathfinding

  • Guni Sharon
  • Roni Stern
  • Meir Goldenberg
  • Ariel Felner

We address the problem of optimal path finding for multiple agents where agents must not collide and their total travel cost should be minimized. Previous work used traditional single-agent search variants of the A* algorithm. In Sharon et. al. (2011), we introduced a novel two-level search algorithm framework for this problem. The high-level searches a novel search tree called increasing cost tree (ICT). The low-level performs a goal test on each ICT node. The new framework, called ICT search (ICTS), showed to run faster than the previous state-of-the-art A* approach by up to three orders of magnitude in many cases. In this paper we focus on the low-level of ICTS which performs the goal test. We introduce a number of optional pruning techniques that can significantly speed up the goal test. We discuss these pruning techniques and provide supporting experimental results.

IJCAI Conference 2011 Conference Paper

The Increasing Cost Tree Search for Optimal Multi-Agent Pathfinding

  • Guni Sharon
  • Roni Stern
  • Meir Goldenberg
  • Ariel Felner

We address the problem of optimal path finding for multiple agents where agents must not collide and their total travel cost should be minimized. Previous work used traditional single-agent search variants of the A* algorithm. We present a novel formalization for this problem which includes a search tree called the increasing cost tree (ICT) and a corresponding search algorithm that finds optimal solutions. We analyze this new formalization and compare it to the previous state-of-the-art A*-based approach. Experimental results on various domains show the benefits and drawbacks of this approach. A speedup of up to 3 orders of magnitude was obtained in a number of cases.