Author name cluster

Timo P. Gros

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

PRL Workshop 2025 Workshop Paper

Learning Per-Domain Generalizing Policies Using Offline Reinforcement Learning

Nicola J. Müller
Moritz Oster
Timo P. Gros

Learned per-domain generalizing policies are gaining popularity in classical planning, as they can solve arbitrary instances of a specific domain. They are typically trained using supervised learning (SL), where we learn to generalize beyond a training set, or reinforcement learning (RL), where we learn from scratch through trial-and-error. We argue that SL and RL should not be seen as contrasting approaches, and propose a training framework where a policy is first trained offline using SL, and then finetuned online using RL. The key method enabling this framework is offline RL. Preliminary experiments show that offline RL can indeed learn perdomain generalizing policies effectively.

PDF Details

PRL Workshop 2024 Workshop Paper

Comparing State-of-the-art Graph Neural Networks and Transformers for General Policy Learning

Nicola J. Müller
Pablo Sanchez Martin
Jörg Hoffmann
Verena Wolf
Timo P. Gros

Graph Neural Networks (GNNs) have recently emerged as a powerful mechanism within the Artificial Intelligence (AI) research community, proving especially effective in a variety of applications from molecular structure prediction to enhancing recommender systems. In the realm of AI planning, the concept of general policy learning — which aims at creating agents capable of solving any instance within a specific domain — has gained significant attention. So far, the pursuit of general policies has often involved the use of custom-built GNN architectures tailored to unique graph representations of planning problems. These custom approaches, while effective, are heavily dependent on the construction of their underlying graph representation, which can limit their applicability and scalability. In this paper, we explore the feasibility of achieving similar successes in general policy learning using standard GNNs and Transformers, which have been extensively tested and researched; the latter are additionally not constrained by specific graph representations. Our findings indicate that while state-of-the-art GNNs and Transformers are generally suitable for general policy learning, their performance does not yet match that of the more specialized, custom-built GNN architectures previously developed in the field.

PDF Details

ICAPS Conference 2022 Conference Paper

Debugging a Policy: Automatic Action-Policy Testing in AI Planning

Marcel Steinmetz
Daniel Fiser
Hasan Ferit Eniser
Patrick Ferber
Timo P. Gros
Philippe Heim
Daniel Höller
Xandra Schuler

Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.

Details

PRL Workshop 2021 Workshop Paper

Debugging a Policy: A Framework for Automatic Action Policy Testing

Marcel Steinmetz
Timo P. Gros
Philippe Heim
Daniel Höller
Joerg Hoffmann

Neural network (NN) action policies are an attractive option for real-time action decisions in dynamic environments. Yet this requires a high degree of trust in the NN. How to gain such trust? Systematic testing certainly is one possible answer, in analogy to program testing. The input to the program becomes the start state for the policy; and erroneous program behaviors – “bugs” – become bad policy behavior, e. g. not reaching the goal from a solvable state. We introduce a framework spelling out this concept. The framework is generic and in principle applicable to arbitrary planning models. We discuss how this form of testing can be operationalized, i. e. , how to confirm a bug has been found, and how potential bugs might be identified in the first place. This essentially involves seeing standard planning concepts through the new lense of policy testing. The implementation and practical exploration of this framework remains open for future work. We believe that action policy testing is an important topic for ICAPS, and we hope that our framework will serve to start its discussion.

PDF Details