Author name cluster

Anton Andreychuk

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers

2 author rows

AAAI Conference 2026 Conference Paper

Enhancing PIBT via Multi-Action Operations

Egor Yukhnevich
Anton Andreychuk

PIBT is a rule-based Multi-Agent Path Finding (MAPF) solver, widely used as a low-level planner or action sampler in many state-of-the-art approaches. Its primary advantage lies in its exceptional speed, enabling action selection for thousands of agents within milliseconds by considering only the immediate next timestep. However, this short-horizon design leads to poor performance in scenarios where agents have orientation and must perform time-consuming rotation actions. In this work, we present an enhanced version of PIBT that addresses this limitation by incorporating multi-action operations. We detail the modifications introduced to improve PIBT's performance while preserving its hallmark efficiency. Furthermore, we demonstrate how our method, when combined with graph guidance technique and large neighborhood search optimization, achieves state-of-the-art performance in the online LMAPF-T setting.

PDF Details DOI

IROS Conference 2025 Conference Paper

Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

Anton Andreychuk
Konstantin S. Yakovlev
Aleksandr Panov 0001
Alexey Skrynnik

Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solvers that leverage machine learning have come on stage. Building on the success of the recently introduced MAPF-GPT, a pure imitation learning solver, we introduce MAPF-GPT-DDG. This novel approach effectively fine-tunes the pre-trained MAPF model using centralized expert data. Leveraging a novel delta-data generation mechanism, MAPF-GPT-DDG accelerates training while significantly improving performance at test time. Our experiments demonstrate that MAPF-GPT-DDG surpasses all existing learning-based MAPF solvers, including the original MAPF-GPT, regarding solution quality across many testing scenarios. Remarkably, it can work with MAPF instances involving up to 1 million agents in a single environment, setting a new milestone for scalability in MAPF domains.

Details

AIJ Journal 2025 Journal Article

Generative models for grid-based and image-based pathfinding

Daniil Kirilenko
Anton Andreychuk
Aleksandr I. Panov
Konstantin Yakovlev

Pathfinding is a challenging problem which generally asks to find a sequence of valid moves for an agent provided with a representation of the environment, i. e. a map, in which it operates. In this work, we consider pathfinding on binary grids and on image representations of the digital elevation models. In the former case, the transition costs are known, while in latter scenario, they are not. A widespread method to solve the first problem is to utilize a search algorithm that systematically explores the search space to obtain a solution. Ideally, the search should also be complemented with an informative heuristic to focus on the goal and prune the unpromising regions of the search space, thus decreasing the number of search iterations. Unfortunately, the widespread heuristic functions for grid-based pathfinding, such as Manhattan distance or Chebyshev distance, do not take the obstacles into account and in obstacle-rich environments demonstrate inefficient performance. As for pathfinding with image inputs, the heuristic search cannot be applied straightforwardly as the transition costs, i. e. the costs of moving from one pixel to the other, are not known. To tackle both challenges, we suggest utilizing modern deep neural networks to infer the instance-dependent heuristic functions at the pre-processing step and further use them for pathfinding with standard heuristic search algorithms. The principal heuristic function that we suggest learning is the path probability, which indicates how likely the grid cell (pixel) is lying on the shortest path (for binary grids with known transition costs, we also suggest another variant of the heuristic function that can speed up the search). Learning is performed in a supervised fashion (while we have also explored the possibilities of end-to-end learning that includes a planner in the learning pipeline). At the test time, path probability is used as the secondary heuristic for the Focal Search, a specific heuristic search algorithm that provides the theoretical guarantees on the cost bound of the resultant solution. Empirically, we show that the suggested approach significantly outperforms state-of-the-art competitors in a variety of different tasks (including out-of-the distribution instances).

Details DOI

AAAI Conference 2025 Conference Paper

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Anton Andreychuk
Konstantin Yakovlev
Aleksandr Panov
Alexey Skrynnik

Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Typically, such learning-based MAPF solvers are augmented with additional components like single-agent planning or communication. Orthogonally, in this work we rely solely on imitation learning that leverages a large dataset of expert MAPF solutions and transformer-based neural network to create a foundation model for MAPF called MAPF-GPT. The latter is capable of generating actions without additional heuristics or communication. MAPF-GPT demonstrates zero-shot learning abilities when solving the MAPF problems that are not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable MAPF solvers on a diverse range of problem instances and is computationally efficient during inference.

PDF Details DOI

ICLR Conference 2025 Conference Paper

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Alexey Skrynnik
Anton Andreychuk
Anatolii Borzilov
Alexander Chernyavskiy
Konstantin S. Yakovlev
Aleksandr Panov 0001

Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of predefined problem instances, a visualization toolkit, and a benchmarking tool for automated evaluation. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators (such as success rate and path length), enabling a fair multi-fold comparison. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.

Details

AAAI Conference 2024 Conference Paper

Decentralized Monte Carlo Tree Search for Partially Observable Multi-Agent Pathfinding

Alexey Skrynnik
Anton Andreychuk
Konstantin Yakovlev
Aleksandr Panov

The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents' starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally and are restricted in communications with each other. Specifically, we investigate the lifelong variant of MAPF, where new goals are continually assigned to the agents upon completion of previous ones. Drawing inspiration from the successful AlphaZero approach, we propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks. Our approach utilizes the agent's observations to recreate the intrinsic Markov decision process, which is then used for planning with a tailored for multi-agent tasks version of neural MCTS. The experimental results show that our approach outperforms state-of-the-art learnable MAPF solvers. The source code is available at https://github.com/AIRI-Institute/mats-lp.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning

Alexey Skrynnik
Anton Andreychuk
Maria Nesterova
Konstantin Yakovlev
Aleksandr Panov

Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that possesses all the information on the agents' locations and goals is absent and the agents have to sequentially decide the actions on their own without having access to the full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver. The code is available at https://github.com/AIRI-Institute/learn-to-follow.

PDF Details DOI

IROS Conference 2024 Conference Paper

Optimal and Bounded Suboptimal Any-Angle Multi-agent Pathfinding

Konstantin S. Yakovlev
Anton Andreychuk
Roni Stern

Multi-agent pathfinding (MAPF) is the problem of finding a set of conflict-free paths for a set of agents. Typically, the agents' moves are limited to a pre-defined graph of possible locations and allowed transitions between them, e. g. a 4-neighborhood grid. We explore how to solve MAPF problems when each agent can move between any pair of possible locations as long as traversing the line segment connecting them does not lead to a collision with the obstacles. This is known as any-angle pathfinding. We present the first optimal any-angle multi-agent pathfinding algorithm. Our planner is based on the Continuous Conflict-based Search (CCBS) algorithm and an optimal any-angle variant of the Safe Interval Path Planning (TO-AA-SIPP). The straightforward combination of those, however, scales poorly since any-angle path finding induces search trees with a very large branching factor. To mitigate this, we adapt two techniques from classical MAPF to the any-angle setting, namely Disjoint Splitting and Multi-Constraints. Experimental results on different combinations of these techniques show they enable solving over 30% more problems than the vanilla combination of CCBS and TO-AA-SIPP. In addition, we present a bounded-suboptimal variant of our algorithm, that enables trading runtime for solution cost in a controlled manner.

Details

SoCS Conference 2024 Conference Paper

Optimal and Bounded Suboptimal Any-Angle Multi-agent Pathfinding (Extended Abstract)

Konstantin S. Yakovlev
Anton Andreychuk
Roni Stern

Multi-agent pathfinding (MAPF) is the problem of finding a set of conflict-free paths for a set of agents. We explore how to solve MAPF problems when each agent can move between any pair of possible locations as long as traversing the line segment connecting them does not lead to a collision with the obstacles. This is known as any-angle pathfinding. We present the first optimal any-angle multi-agent pathfinding algorithm. Our planner is based on the Continuous Conflict-based Search (CCBS) algorithm and an optimal any-angle variant of the Safe Interval Path Planning (TO-AA-SIPP). The straightforward combination of those, however, scales poorly. To mitigate this, we adapt two techniques from classical MAPF to the any-angle setting, namely Disjoint Splitting and Multi-Constraints. Experimental results on different combinations of these techniques show they enable solving over 30% more problems than the vanilla combination of CCBS and TO-AA-SIPP. In addition, we present a bounded-suboptimal variant of our algorithm, that enables trading runtime for solution cost in a controlled manner.

Details

PRL Workshop 2023 Workshop Paper

Learn to Follow: Lifelong Multi-agent Pathfinding with Decentralized Replanning

Alexey Skrynnik
Anton Andreychuk
Maria Nesterova
Konstantin Yakovlev
Aleksandr Panov

Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph. In conventional MAPF scenarios, the graph and the agents' start and goal locations are known in advance. Thus, a centralized planning algorithm can be utilized to generate a solution. In this work, we investigate the decentralized MAPF setting, in which the agents can not share the information and must independently navigate toward their goals without knowing the other agents' goals or paths. We focus on the lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning (RL) through policy optimization. Planning is utilized to maintain an individual path, while RL is employed to discover the collision avoidance policies that effectively guide an agent along the path. This decomposition and intrinsic motivation specific for multi-agent scenarios allows leveraging replanning with learnable policies. We evaluate our method on a wide range of setups and compare it to the state-of-the-art competitors (both learnable and search-based). The results show that our method consistently outperforms the competitors in challenging setups when the number of agents is high.

PDF Details

AAAI Conference 2023 Conference Paper

TransPath: Learning Heuristics for Grid-Based Pathfinding via Transformers

Daniil Kirilenko
Anton Andreychuk
Aleksandr Panov
Konstantin Yakovlev

Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games, etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account, and thus the search led by such heuristics performs poorly in obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance-independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, learning the correction factor utilizes the knowledge of the instance-independent heuristic. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be employed in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of 4x while producing the solutions, whose costs exceed those of the optimal solutions by less than 0.3% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners. The project web-page is: https://airi-institute.github.io/TransPath/.

PDF Details DOI

SoCS Conference 2022 Conference Paper

Lower and Upper Bounds for Multi-Agent Multi-Item Pickup and Delivery: When a Decoupled Approach is Good Enough (Extended Abstract)

David Zahrádka
Anton Andreychuk
Miroslav Kulich
Konstantin S. Yakovlev

The Multi-agent Multi-item Pickup and Delivery problem (MAMPD) stands for a problem of finding collision-free trajectories for a fleet of mobile agents transporting a set of items from their initial positions to specified locations. Each agent can carry multiple items up to a given capacity. We study the solution quality of the naive decoupled approach, which decouples the problem into task assignment (TA) and Multi-Agent Pathfinding (MAPF). By computing the gap between the lower bound of the MAMPD cost, estimated using the TA cost, and the upper bound, given by the final MAMPD cost, we show that the decoupled approach is able to obtain near-optimal solutions in a wide range of cases.

Details

AIJ Journal 2022 Journal Article

Multi-agent pathfinding with continuous time

Anton Andreychuk
Konstantin Yakovlev
Pavel Surynek
Dor Atzmon
Roni Stern

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that each agent reaches its goal and the agents do not collide. In recent years, variants of MAPF have risen in a wide range of real-world applications such as warehouse management and autonomous vehicles. Optimizing common MAPF objectives, such as minimizing sum-of-costs or makespan, is computationally intractable, but state-of-the-art algorithms are able to solve optimally problems with dozens of agents. However, most MAPF algorithms assume that (1) time is discretized into time steps and (2) the duration of every action is one time step. These simplifying assumptions limit the applicability of MAPF algorithms in real-world applications and raise non-trivial questions such as how to discretize time in an effective manner. We propose two novel MAPF algorithms for finding optimal solutions that do not rely on any time discretization. In particular, our algorithms do not require quantization of wait and move actions' durations, allowing these durations to take any value required to find optimal solutions. The first algorithm we propose, called Continuous-time Conflict-Based Search (CCBS), draws on ideas from Safe Interval Path Planning (SIPP), a single-agent pathfinding algorithm designed to cope with dynamic obstacles, and Conflict-Based Search (CBS), a state-of-the-art search-based MAPF algorithm. SMT-CCBS builds on similar ideas, but is based on a different state-of-the-art MAPF algorithm called SMT-CBS, which applied a SAT Modulo Theory (SMT) problem-solving procedure. CCBS guarantees to return solutions that have minimal sum-of-costs, while SMT-CCBS guarantees to return solutions that have minimal makespan. We implemented CCBS and SMT-CCBS and evaluated them on grid-based MAPF problems and general graphs (roadmaps). The results show that both algorithms can efficiently solve optimally non-trivial MAPF problems.

Details DOI

SoCS Conference 2021 Conference Paper

Improving Continuous-time Conflict Based Search

Anton Andreychuk
Konstantin S. Yakovlev
Eli Boyarski
Roni Stern

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for n agents in a graph such that each agent reaches its goal vertex and the agents do not collide with each other while moving along these paths. While different problem statements of MAPF exist, we are focused on MAPFr (Walker, Sturtevant, and Felner 2018), in which actions’ durations can be non-uniform, agents have geometric shapes, and time is continuous. Continuous-time conflict-based search (CCBS) (Andreychuk et al. 2019) is a recently proposed algorithm for finding optimal solutions to MAPFr problems. In this work, we propose several improvements to CCBS based on known improvements to the Conflictbased search (CBS) algorithm (Sharon et al. 2015) for classical MAPF, namely Disjoint Splitting (DS), Prioritizing Conflicts (PC), and high-level heuristics. We evaluate the impact of these improvements experimentally on both roadmaps and grids. Our results show that CCBS with these improvements is able to solve significantly more problems.

Details

AAAI Conference 2021 Conference Paper

Improving Continuous-time Conflict Based Search

Anton Andreychuk
Konstantin Yakovlev
Eli Boyarski
Roni Stern

Conflict-Based Search (CBS) is a powerful algorithmic framework for optimally solving classical multi-agent path finding (MAPF) problems, where time is discretized into the time steps. Continuous-time CBS (CCBS) is a recently proposed version of CBS that guarantees optimal solutions without the need to discretize time. However, the scalability of CCBS is limited because it does not include any known improvements of CBS. In this paper, we begin to close this gap and explore how to adapt successful CBS improvements, namely, prioritizing conflicts (PC), disjoint splitting (DS), and high-level heuristics, to the continuous time setting of CCBS. These adaptions are not trivial, and require careful handling of different types of constraints, applying a generalized version of the Safe interval path planning (SIPP) algorithm, and extending the notion of cardinal conflicts. We evaluate the effect of the suggested enhancements by running experiments both on general graphs and 2k -neighborhood grids. CCBS with these improvements significantly outperforms vanilla CCBS, solving problems with almost twice as many agents in some cases and pushing the limits of multiagent path finding in continuous-time domains.

PDF Details

SoCS Conference 2021 Conference Paper

Towards Narrowing the Search in Bounded-Suboptimal Safe Interval Path Planning

Tomás Rybecký
Miroslav Kulich
Anton Andreychuk
Konstantin S. Yakovlev

Path planning in the presence of dynamic obstacles is challenging as the time dimension has to be considered. A prominent approach to tackle this problem known to be complete and optimal is the A*-based Safe-interval Path Planning (SIPP). Bounded-suboptimal variants of SIPP employing the ideas of Weighted A* (WSIPP) and Focal Search (FocalSIPP) have been introduced recently, trading-off optimality for decreased planning time. In this paper, we revisit FocalSIPP and design several secondary heuristics for Focal Search with the intention to narrow the search in the direction of a preplanned optimal single-agent path not considering dynamic obstacles. The experimental results on various maps show that the designed heuristics generally outperform the hops-to-the-goal heuristic used in the original FocalSIPP and successfully compete with WSIPP as well.

Details

ICAPS Conference 2021 Conference Paper

Towards Time-Optimal Any-Angle Path Planning With Dynamic Obstacles

Konstantin S. Yakovlev
Anton Andreychuk

Path finding is a well-studied problem in AI, which is often framed as graph search. Any-angle path finding is a technique that augments the initial graph with additional edges to build shorter paths to the goal. Indeed, optimal algorithms for any-angle path finding in static environments exist. However, when dynamic obstacles are present and time is the objective to be minimized, these algorithms can no longer guarantee optimality. In this work, we elaborate on why this is the case and what techniques can be used to solve the problem optimally. We present two algorithms, grounded in the same idea, that can obtain provably optimal solutions to the considered problem. One of them is a naive algorithm and the other one is much more involved. We conduct a thorough empirical evaluation showing that, in certain setups, the latter algorithm might be as fast as the previously-known greedy non-optimal solver while providing solutions of better quality. In some (rare) cases, the difference in cost is up to 76%, while on average it is lower than one percent (the same cost difference is typically observed between optimal and greedy any-angle solvers in static environments).

Details

ICAPS Conference 2020 Conference Paper

Revisiting Bounded-Suboptimal Safe Interval Path Planning

Konstantin S. Yakovlev
Anton Andreychuk
Roni Stern

Safe-interval path planning (SIPP) is a powerful algorithm for finding a path in the presence of dynamic obstacles. SIPP returns provably optimal solutions. However, in many practical applications of SIPP such as path planning for robots, one would like to trade-off optimality for shorter planning time. In this paper we explore different ways to build a bounded-suboptimal SIPP and discuss their pros and cons. We compare the different bounded-suboptimal versions of SIPP experimentally. While there is no universal winner, the results provide insights into when each method should be used.

Details

IJCAI Conference 2019 Conference Paper

Multi-Agent Pathfinding with Continuous Time

Anton Andreychuk
Konstantin Yakovlev
Dor Atzmon
Roni Stern

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multiple agents such that every agent reaches its goal and the agents do not collide. Most prior work on MAPF were on grids, assumed agents' actions have uniform duration, and that time is discretized into timesteps. In this work, we propose a MAPF algorithm that do not assume any of these assumptions, is complete, and provides provably optimal solutions. This algorithm is based on a novel combination of Safe Interval Path Planning (SIPP), a continuous time single agent planning algorithms, and Conflict-Based Search (CBS). We analyze this algorithm, discuss its pros and cons, and evaluate it experimentally on several standard benchmarks.

PDF Details

AAMAS Conference 2018 Conference Paper

Two Techniques That Enhance the Performance of Multi-robot Prioritized Path Planning

Anton Andreychuk
Konstantin Yakovlev

We introduce and empirically evaluate two techniques aimed at enhancing the performance of multi-robot prioritized path planning. The first technique is the deterministic procedure for re-scheduling (as opposed to well-known approach based on random restarts), the second one is the heuristic procedure that modifies the search-space of the individual planner involved in the prioritized path finding.

PDF

ICAPS Conference 2017 Conference Paper

Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm

Konstantin S. Yakovlev
Anton Andreychuk

The problem of finding conflict-free trajectories for multiple agents of identical circular shape, operating in shared 2D workspace, is addressed in the paper and decoupled, e. g. , prioritized, approach is used to solve this problem. Agents' workspace is tessellated into the square grid on which any-angle moves are allowed, e. g. each agent can move into an arbitrary direction as long as this move follows the straight line segment whose endpoints are tied to the distinct grid elements. A novel any-angle planner based on Safe Interval Path Planning (SIPP) algorithm is proposed to find trajectories for an agent moving amidst dynamic obstacles (other agents) on a grid. This algorithm is then used as part of a prioritized multi-agent planner AA-SIPP(m). On the theoretical side, we show that AA-SIPP(m) is complete under well-defined conditions. On the experimental side, in simulation tests with up to 250 agents involved, we show that our planner finds much better solutions in terms of cost (up to 20%) compared to the planners relying on cardinal moves only.

Details