Arrow Research search

Author name cluster

Tom Silver

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

ICLR Conference 2025 Conference Paper

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

  • Yichao Liang
  • Nishanth Kumar
  • Hao Tang 0008
  • Adrian Weller
  • Joshua B. Tenenbaum
  • Tom Silver
  • João F. Henriques
  • Kevin Ellis

Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. We outline an online algorithm for inventing such predicates and learning abstract world models. We compare our approach to hierarchical reinforcement learning, vision-language model planning, and symbolic predicate invention approaches, on both in- and out-of-distribution tasks across five simulated robotic domains. Results show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.

AAAI Conference 2024 Conference Paper

Generalized Planning in PDDL Domains with Pretrained Large Language Models

  • Tom Silver
  • Soham Dan
  • Kavitha Srinivas
  • Joshua B. Tenenbaum
  • Leslie Kaelbling
  • Michael Katz

Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.

PRL Workshop 2023 Workshop Paper

Generalized Planning in PDDL Domains with Pretrained Large Language Models

  • Tom Silver
  • Soham Dan
  • Kavitha Srinivas
  • Joshua B. Tenenbaum
  • Leslie Pack Kaelbling
  • Michael Katz

Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.

AAAI Conference 2023 Conference Paper

Predicate Invention for Bilevel Planning

  • Tom Silver
  • Rohan Chitnis
  • Nishanth Kumar
  • Willie McClinton
  • Tomás Lozano-Pérez
  • Leslie Kaelbling
  • Joshua B. Tenenbaum

Efficient planning in continuous state and action spaces is fundamentally hard, even when the transition model is deterministic and known. One way to alleviate this challenge is to perform bilevel planning with abstractions, where a high-level search for abstract plans is used to guide planning in the original transition space. Previous work has shown that when state abstractions in the form of symbolic predicates are hand-designed, operators and samplers for bilevel planning can be learned from demonstrations. In this work, we propose an algorithm for learning predicates from demonstrations, eliminating the need for manually specified state abstractions. Our key idea is to learn predicates by optimizing a surrogate objective that is tractable but faithful to our real efficient-planning objective. We use this surrogate objective in a hill-climbing search over predicate sets drawn from a grammar. Experimentally, we show across four robotic planning environments that our learned abstractions are able to quickly solve held-out tasks, outperforming six baselines.

AAAI Conference 2022 Conference Paper

Discovering State and Action Abstractions for Generalized Task and Motion Planning

  • Aidan Curtis
  • Tom Silver
  • Joshua B. Tenenbaum
  • Tomás Lozano-Pérez
  • Leslie Kaelbling

Generalized planning accelerates classical planning by finding an algorithm-like policy that solves multiple instances of a task. A generalized plan can be learned from a few training examples and applied to an entire domain of problems. Generalized planning approaches perform well in discrete AI planning problems that involve large numbers of objects and extended action sequences to achieve the goal. In this paper, we propose an algorithm for learning features, abstractions, and generalized plans for continuous robotic task and motion planning (TAMP) and examine the unique difficulties that arise when forced to consider geometric and physical constraints as a part of the generalized plan. Additionally, we show that these simple generalized plans learned from only a handful of examples can be used to improve the search efficiency of TAMP solvers.

IROS Conference 2022 Conference Paper

Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning

  • Rohan Chitnis
  • Tom Silver
  • Joshua B. Tenenbaum
  • Tomás Lozano-Pérez
  • Leslie Pack Kaelbling

In robotic domains, learning and planning are complicated by continuous state spaces, continuous action spaces, and long task horizons. In this work, we address these challenges with Neuro-Symbolic Relational Transition Models (NSRTs), a novel class of models that are data-efficient to learn, compatible with powerful robotic planning methods, and generalizable over objects. NSRTs have both symbolic and neural components, enabling a bilevel planning scheme where symbolic AI planning in an outer loop guides continuous planning with neural models in an inner loop. Experiments in four robotic planning domains show that NSRTs can be learned very data-efficiently, and then used for fast planning in new tasks that require up to 60 actions and involve many more objects than were seen during training.

IJCAI Conference 2022 Conference Paper

PG3: Policy-Guided Planning for Generalized Policy Generation

  • Ryan Yang
  • Tom Silver
  • Aidan Curtis
  • Tomas Lozano-Perez
  • Leslie Kaelbling

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions --- policy evaluation and plan comparison --- and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generalization (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.

PRL Workshop 2022 Workshop Paper

PG3: Policy-Guided Planning for Generalized Policy Generation

  • Ryan Yang
  • Tom Silver
  • Aidan Curtis
  • Tomas Lozano-Perez
  • Leslie Kaelbling

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions — policy evaluation and plan comparison — and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generalization (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.

ICAPS Conference 2022 Conference Paper

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

  • Clement Gehring
  • Masataro Asai
  • Rohan Chitnis
  • Tom Silver
  • Leslie Pack Kaelbling
  • Shirin Sohrabi
  • Michael Katz 0001

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain. The source code and the appendix are available at github. com/ibm/pddlrl and arxiv. org/abs/2109. 14830.

AAAI Conference 2021 Conference Paper

GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling

  • Rohan Chitnis
  • Tom Silver
  • Joshua B. Tenenbaum
  • Leslie Pack Kaelbling
  • Tomás Lozano-Pérez

We address the problem of efficient exploration for transition model learning in the relational model-based reinforcement learning setting without extrinsic goals or rewards. Inspired by human curiosity, we propose goal-literal babbling (GLIB), a simple and general method for exploration in such problems. GLIB samples relational conjunctive goals that can be understood as specific, targeted effects that the agent would like to achieve in the world, and plans to achieve these goals using the transition model being learned. We provide theoretical guarantees showing that exploration with GLIB will converge almost surely to the ground truth model. Experimentally, we find GLIB to strongly outperform existing methods in both prediction and planning on a range of tasks, encompassing standard PDDL and PPDDL planning benchmarks and a robotic manipulation task implemented in the PyBullet physics simulator. Video: https: //youtu. be/F6lmrPT6TOY Code: https: //git. io/JIsTB

PRL Workshop 2021 Workshop Paper

Learning Search Guidance from Failures with Eliminable Edge Sets

  • Catherine Zeng
  • Tom Silver

What can be learned from previous planning experience when none of it was successful in finding any plans? We study this question in the planning-as-graph-search setting. Our main insight is that certain eliminable edge sets can be identified from failed graph searches. These edge sets can then be used to train a generalized predictor of eliminable edges, which in turn can be used to guide search on new planning problems from the same domain. Our preliminary experimental findings across four visual navigation domains suggest that this technique of learning from failed search attempts can result in substantially improved planning in terms of the number of nodes expanded before finding a plan. I have not failed. I’ve just found 10, 000 ways that won’t work. T HOMAS E DISON

IROS Conference 2021 Conference Paper

Learning Symbolic Operators for Task and Motion Planning

  • Tom Silver
  • Rohan Chitnis
  • Joshua B. Tenenbaum
  • Leslie Pack Kaelbling
  • Tomás Lozano-Pérez

Robotic planning problems in hybrid state and action spaces can be solved by integrated task and motion planners (TAMP) that handle the complex interaction between motion-level decisions and task-level plan feasibility. TAMP approaches rely on domain-specific symbolic operators to guide the task-level search, making planning efficient. In this work, we formalize and study the problem of operator learning for TAMP. Central to this study is the view that operators define a lossy abstraction of the transition model of a domain. We then propose a bottom-up relational learning method for operator learning and show how the learned operators can be used for planning in a TAMP system. Experimentally, we provide results in three domains, including long-horizon robotic planning tasks. We find our approach to substantially outperform several baselines, including three graph neural network-based model-free approaches from the recent literature. Video: https://youtu.be/iVfpX9BpBRo.Code: https://git.io/JCT0g

AAAI Conference 2021 Conference Paper

Planning with Learned Object Importance in Large Problem Instances using Graph Neural Networks

  • Tom Silver
  • Rohan Chitnis
  • Aidan Curtis
  • Joshua B. Tenenbaum
  • Tomás Lozano-Pérez
  • Leslie Pack Kaelbling

Real-world planning problems often involve hundreds or even thousands of objects, straining the limits of modern planners. In this work, we address this challenge by learning to predict a small set of objects that, taken together, would be sufficient for finding a plan. We propose a graph neural network architecture for predicting object importance in a single inference pass, thus incurring little overhead while greatly reducing the number of objects that must be considered by the planner. Our approach treats the planner and transition model as black boxes, and can be used with any off-the-shelf planner. Empirically, across classical planning, probabilistic planning, and robotic task and motion planning, we find that our method results in planning that is significantly faster than several baselines, including other partial grounding strategies and lifted planners. We conclude that learning to predict a sufficient set of objects for a planning problem is a simple, powerful, and general mechanism for planning in large instances. Video: https: //youtu. be/FWsVJc2fvCE Code: https: //git. io/JIsqX

PRL Workshop 2021 Workshop Paper

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

  • Clement Gehring
  • Masataro Asai
  • Rohan Chitnis
  • Tom Silver
  • Leslie Kaelbling
  • Shirin Sohrabi
  • Michael Katz

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains and vise versa. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and our RL agent learns domain-specific value functions as residuals on these heuristics, making learning easier. Proper application of this technique requires consolidating the discounted metric in RL and non-discounted metric in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain.

AAAI Conference 2020 Conference Paper

Few-Shot Bayesian Imitation Learning with Logical Program Policies

  • Tom Silver
  • Kelsey R. Allen
  • Alex K. Lew
  • Leslie Pack Kaelbling
  • Josh Tenenbaum

Humans can learn many novel tasks from a very small number (1–5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domainspecific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study six strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20–1, 000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.

IROS Conference 2020 Conference Paper

Learning constraint-based planning models from demonstrations

  • João Loula
  • Kelsey R. Allen
  • Tom Silver
  • Joshua B. Tenenbaum

How can we learn representations for planning that are both efficient and flexible? Task and motion planning models are a good candidate, having been very successful in long-horizon planning tasks-however, they've proved challenging for learning, relying mostly on hand-coded representations. We present a framework for learning constraint-based task and motion planning models using gradient descent. Our model observes expert demonstrations of a task and decomposes them into modes-segments which specify a set of constraints on a trajectory optimization problem. We show that our model learns these modes from few demonstrations, that modes can be used to plan flexibly in different environments and to achieve different types of goals, and that the model can recombine these modes in novel ways.

NeurIPS Conference 2020 Conference Paper

Online Bayesian Goal Inference for Boundedly Rational Planning Agents

  • Tan Zhi-Xuan
  • Jordyn Mann
  • Tom Silver
  • Josh Tenenbaum
  • Vikash Mansinghka

People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent’s goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.

PRL Workshop 2020 Workshop Paper

PDDLGym: Gym Environments from PDDL Problems

  • Tom Silver
  • Rohan Chitnis

We present PDDLGym, a framework that automatically constructs OpenAI Gym environments from PDDL domains and problems. Observations and actions in PDDLGym are relational, making the framework particularly well-suited for research in relational reinforcement learning and relational sequential decision-making. PDDLGym is also useful as a generic framework for rapidly building numerous, diverse benchmarks from a concise and familiar specification language. We discuss design decisions and implementation details, and also illustrate empirical variations between the 20 built-in environments in terms of planning and modellearning difficulty. We hope that PDDLGym will facilitate bridge-building between the reinforcement learning community (from which Gym emerged) and the AI planning community (which produced PDDL). We look forward to gathering feedback from all those interested and expanding the set of available environments and features accordingly.

RLDM Conference 2019 Conference Abstract

Few-Shot Imitation Learning with Disjunctions of Conjunctions of Programs

  • Tom Silver
  • Kelsey Allen
  • Leslie Kaelbling
  • Joshua Tenenbaum

We describe an expressive class of policies that can be efficiently learned from a few demonstra- tions. Policies are represented as disjunctions (logical or’s) of conjunctions (logical and’s) of programs from a small domain-specific language (DSL). We define a prior over policies with a probabilistic grammar and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few (at most eight) demon- strations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. We also find that policies inferred from single demonstrations can be used for efficient exploration to dramatically reduce RL sample complexity.

AAAI Conference 2018 Conference Paper

Behavior Is Everything: Towards Representing Concepts with Sensorimotor Contingencies

  • Nicholas Hay
  • Michael Stark
  • Alexander Schlegel
  • Carter Wendelken
  • Dennis Park
  • Eric Purdy
  • Tom Silver
  • D. Scott Phoenix

AI has seen remarkable progress in recent years, due to a switch from hand-designed shallow representations, to learned deep representations. While these methods excel with plentiful training data, they are still far from the human ability to learn concepts from just a few examples by reusing previously learned conceptual knowledge in new contexts. We argue that this gap might come from a fundamental misalignment between human and typical AI representations: while the former are grounded in rich sensorimotor experience, the latter are typically passive and limited to a few modalities such as vision and text. We take a step towards closing this gap by proposing an interactive, behavior-based model that represents concepts using sensorimotor contingencies grounded in an agent’s experience. On a novel conceptual learning and benchmark suite, we demonstrate that conceptually meaningful behaviors can be learned, given supervision via training curricula.

ICML Conference 2017 Conference Paper

Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics

  • Ken Kansky
  • Tom Silver
  • David A. Mély
  • Mohamed Eldawy
  • Miguel Lázaro-Gredilla
  • Xinghua Lou
  • Nimrod Dorfman
  • Szymon Sidor

The recent adaptation of deep neural network-based methods to reinforcement learning and planning domains has yielded remarkable progress on individual tasks. Nonetheless, progress on task-to-task transfer remains limited. In pursuit of efficient and robust generalization, we introduce the Schema Network, an object-oriented generative physics simulator capable of disentangling multiple causes of events and reasoning backward through causes to achieve goals. The richly structured architecture of the Schema Network can learn the dynamics of an environment directly from data. We compare Schema Networks with Asynchronous Advantage Actor-Critic and Progressive Networks on a suite of Breakout variations, reporting results on training efficiency and zero-shot generalization, consistently demonstrating faster, more robust learning and better transfer. We argue that generalizing from limited data and learning causal relationships are essential abilities on the path toward generally intelligent systems.