Author name cluster

Michael Buro

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

ICML Conference 2025 Conference Paper

Subgoal-Guided Policy Heuristic Search with Learned Subgoals

Jake Tuero
Michael Buro
Levi H. S. Lelis

Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.

Details

NeurIPS Conference 2023 Conference Paper

History Filtering in Imperfect Information Games: Algorithms and Complexity

Christopher Solinas
Doug Rebstock
Nathan Sturtevant
Michael Buro

Historically applied exclusively to perfect information games, depth-limited search with value functions has been key to recent advances in AI for imperfect information games. Most prominent approaches with strong theoretical guarantees require subgame decomposition - a process in which a subgame is computed from public information and player beliefs. However, subgame decomposition can itself require non-trivial computations, and its tractability depends on the existence of efficient algorithms for either full enumeration or generation of the histories that form the root of the subgame. Despite this, no formal analysis of the tractability of such computations has been established in prior work, and application domains have often consisted of games, such as poker, for which enumeration is trivial on modern hardware. Applying these ideas to more complex domains requires understanding their cost. In this work, we introduce and analyze the computational aspects and tractability of filtering histories for subgame decomposition. We show that constructing a single history from the root of the subgame is generally intractable, and then provide a necessary and sufficient condition for efficient enumeration. We also introduce a novel Markov Chain Monte Carlo-based generation algorithm for trick-taking card games - a domain where enumeration is often prohibitively expensive. Our experiments demonstrate its improved scalability in the trick-taking card game Oh Hell. These contributions clarify when and how depth-limited search via subgame decomposition can be an effective tool for sequential decision-making in imperfect information settings.

PDF Details

AAAI Conference 2021 Conference Paper

Bayes DistNet – A Robust Neural Network for Algorithm Runtime Distribution Predictions

Jake Tuero
Michael Buro

Randomized algorithms are used in many state-of-theart solvers for constraint satisfaction problems (CSP) and Boolean satisfiability (SAT) problems. For many of these problems, there is no single solver which will dominate others. Having access to the underlying runtime distributions (RTD) of these solvers can allow for better use of algorithm selection, algorithm portfolios, and restart strategies. Previous state-of-the-art methods directly try to predict a fixed parametric distribution that the input instance follows. In this paper, we extend RTD prediction models into the Bayesian setting for the first time. This new model achieves robust predictive performance in the low observation setting, as well as handling censored observations. This technique also allows for richer representations which cannot be achieved by the classical models which restrict their output representations. Our model outperforms the previous state-of-the-art model in settings in which data is scarce, and can make use of censored data such as lower bound time estimates, where that type of data would otherwise be discarded. It can also quantify its uncertainty in its predictions, allowing for algorithm portfolio models to make better informed decisions about which algorithm to run on a particular instance.

PDF Details

AAAI Conference 2021 Conference Paper

Inference-Based Deterministic Messaging For Multi-Agent Communication

Varun Bhatt
Michael Buro

Communication is essential for coordination among humans and animals. Therefore, with the introduction of intelligent agents into the world, agent-to-agent and agent-to-human communication becomes necessary. In this paper, we first study learning in matrix-based signaling games to empirically show that decentralized methods can converge to a suboptimal policy. We then propose a modification to the messaging policy, in which the sender deterministically chooses the best message that helps the receiver to infer the sender’s observation. Using this modification, we see, empirically, that the agents converge to the optimal policy in nearly all the runs. We then apply this method to a partially observable gridworld environment which requires cooperation between two agents and show that, with appropriate approximation methods, the proposed sender modification can enhance existing decentralized training methods for more complex domains as well.

PDF Details

AAAI Conference 2019 Conference Paper

Improving Search with Supervised Learning in Trick-Based Card Games

Christopher Solinas
Douglas Rebstock
Michael Buro

In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network — trained on data from human gameplay — in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.

PDF Details

IJCAI Conference 2015 Conference Paper

Adversarial Hierarchical-Task Network Planning for Complex Real-Time Games

Santiago Ontanon
Michael Buro

Real-time strategy (RTS) games are hard from an AI point of view because they have enormous state spaces, combinatorial branching factors, allow simultaneous and durative actions, and players have very little time to choose actions. For these reasons, standard game tree search methods such as alphabeta search or Monte Carlo Tree Search (MCTS) are not sufficient by themselves to handle these games. This paper presents an alternative approach called Adversarial Hierarchical Task Network (AHTN) planning that combines ideas from game tree search with HTN planning. We present the basic algorithm, relate it to existing adversarial hierarchical planning methods, and present new extensions for simultaneous and durative actions to handle RTS games. We also present empirical results for the µRTS game, comparing it to other state of the art search algorithms for RTS games.

PDF Details

ECAI Conference 2014 Conference Paper

Introducing Hierarchical Adversarial Search, a Scalable Search Procedure for Real-Time Strategy Games

Marius Stanescu
Nicolas A. Barriga
Michael Buro

Real-Time Strategy (RTS) video games have proven to be a very challenging application area for Artificial Intelligence research. Existing AI solutions are limited by vast state and action spaces and real-time constraints. Most implementations efficiently tackle various tactical or strategic sub-problems, but there is no single algorithm fast enough to be successfully applied to full RTS games. This paper introduces a hierarchical adversarial search framework which implements a different abstraction at each level - from deciding how to win the game at the top of the hierarchy to individual unit orders at the bottom.

Details

AAAI Conference 2012 Conference Paper

Alpha-Beta Pruning for Games with Simultaneous Moves

Abdallah Saffidine
Hilmar Finnsson
Michael Buro

Alpha-Beta pruning is one of the most powerful and fundamental MiniMax search improvements. It was designed for sequential two-player zero-sum perfect information games. In this paper we introduce an Alpha-Beta-like sound pruning method for the more general class of “stacked matrix games” that allow for simultaneous moves by both players. This is accomplished by maintaining upper and lower bounds for achievable payoffs in states with simultaneous actions and dominated action pruning based on the feasibility of certain linear programs. Empirical data shows considerable savings in terms of expanded nodes compared to naive depth-first move computation without pruning.

PDF Details

IJCAI Conference 2011 Conference Paper

Real-Time Opponent Modeling in Trick-Taking Card Games

Jeffrey Long
Michael Buro

As adversarial environments become more complex, it is increasingly crucial for agents to exploit the mistakes of weaker opponents, particularly in the context of winning tournaments and competitions. In this work, we present a simple post processing technique, which wecall Perfect Information Post-Mortem Analysis (PIPMA), that can quickly assess the playing strength of an opponent in certain classes of game environments. We apply this technique to skat, a popular German card game, and show that we can achieve substantial performance gains against not only players weaker than our program, but against stronger players as well. Most importantly, PIPMA can model the opponent after only a handful of games. To our knowledge, this makes our work the first successful example of an opponent modelling technique that can adapt its play to a particular opponent in real time in a complex game setting.

PDF Details DOI

IJCAI Conference 2011 Conference Paper

Using Payoff-Similarity to Speed Up Search

Timothy Furtak
Michael Buro

Transposition tables are a powerful tool in search domains for avoiding duplicate effort and for guiding node expansions. Traditionally, however, they have only been applicable when the current state is exactly the same as a previously explored state. We consider a generalized transposition table, whereby a similarity metric that exploits local structure is used to compare the current state with a neighbourhood of previously seen states. We illustrate this concept and forward pruning based on function approximation in the domain of Skat, and show that we can achieve speedups of 16+ over standard methods.

PDF Details DOI

AAAI Conference 2010 Conference Paper

Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search

Jeffrey Long
Nathan Sturtevant
Michael Buro
Timothy Furtak

Perfect Information Monte Carlo (PIMC) search is a practical technique for playing imperfect information games that are too large to be optimally solved. Although PIMC search has been criticized in the past for its theoretical deficiencies, in practice it has often produced strong results in a variety of domains. In this paper, we set out to resolve this discrepancy. The contributions of the paper are twofold. First, we use synthetic game trees to identify game properties that result in strong or weak performance for PIMC search as compared to an optimal player. Second, we show how these properties can be detected in real games, and demonstrate that they do indeed appear to be good predictors of the strength of PIMC search. Thus, using the tools established in this paper, it should be possible to decide a priori whether PIMC search will be an effective approach to new and unexplored games.

PDF Details

IJCAI Conference 2009 Conference Paper

Timothy Furtak
Michael Buro

Alpha-Beta is the most common game tree search algorithm, due to its high-performance and straightforward implementation. In practice one must ﬁnd the best trade-off between heuristic evaluation time and bringing the subset of nodes explored closer to a minimum proof graph. In this paper we present a series of structural properties of minimum proof graphs that help us to prove that ﬁnding such graphs is NP-hard for arbitrary DAG inputs, but can be done in linear time for trees. We then introduce the class of fastest-cut-ﬁrst search heuristics that aim to approximate minimum proof graphs by sorting moves based on approximations of sub-DAG values and sizes. To explore how various aspects of the game tree (such as branching factor and distribution of move values) affect the performance of Alpha-Beta we introduce the class of “Preﬁx Value Game Trees” that allows us to label interior nodes with true minimax values on the ﬂy without search. Using these trees we show that by explicitly attempting to approximate a minimum game tree we are able to achieve performance gains over Alpha-Beta with common extensions.

PDF Details

IJCAI Conference 2009 Conference Paper

Michael Buro
Jeffrey R. Long
Timothy Furtak
Nathan Sturtevant

Skat is Germany’s national card game played by millions of players around the world. In this paper, we present the world’s ﬁrst computer skat player that plays at the level of human experts. This performance is achieved by improving state evaluations using game data produced by human players and by using these state evaluations to perform inference on the unobserved hands of opposing players. Our results demonstrate the gains from adding inference to an imperfect information game player and show that training on data from average human players can result in expert-level playing strength.

PDF Details

AAAI Conference 2007 Conference Paper

Concurrent Action Execution with Shared Fluents

Michael Buro

Concurrent action execution is important for plan-length minimization. However, action speciﬁcations are often limited to avoid conﬂicts arising from precondition/effect interactions. PDDL — the planning domain deﬁnition language — for example, implements the “no moving targets” rule, which means that no two actions can simultaneously make use of a value if one of the two is updating the value. This rule poses problems for resource allocation planning in which resource values are accessed in preconditions and effects. A simple example is construction actions that consume certain amounts of a resource. For speeding up plan execution, we would like to be able to dispatch several construction actions simultaneously. Because action preconditions depend on resource values and action effects change them, the “no moving targets” rule does not allow concurrent execution. However, if sufﬁcient resources are available, executing actions simultaneously poses no problems. This paper addresses the problem of deciding whether a set of actions produced by a planning system can be executed concurrently in the presence of ﬂuent variables that occur in both action preconditions and effects. We ﬁrst motivate the concurrent action execution problem by introducing a fair action scheduling algorithm for real-time strategy (RTS) games. Then we prove that the general decision problem, when restricting effects and preconditions to polynomial time computations, is co-NP complete. Thereafter, we focus on problem restrictions based on commutative operators which allow us to specify sufﬁcient conditions for concurrent executability that can be checked quickly if the number of shared ﬂuents is small. Finally, we apply these ﬁndings to action execution with shared resources in RTS games.

PDF Details

IJCAI Conference 2003 Conference Paper

Real-Time Strategy Games: A New AI Research Challenge

Michael Buro

PDF

AIJ Journal 2002 Journal Article

Improving heuristic mini-max search by supervised learning

Michael Buro

Details DOI