PRL Workshop 2025 Workshop Paper
Learning Per-Domain Generalizing Policies Using Offline Reinforcement Learning
- Nicola J. Müller
- Moritz Oster
- Timo P. Gros
Learned per-domain generalizing policies are gaining popularity in classical planning, as they can solve arbitrary instances of a specific domain. They are typically trained using supervised learning (SL), where we learn to generalize beyond a training set, or reinforcement learning (RL), where we learn from scratch through trial-and-error. We argue that SL and RL should not be seen as contrasting approaches, and propose a training framework where a policy is first trained offline using SL, and then finetuned online using RL. The key method enabling this framework is offline RL. Preliminary experiments show that offline RL can indeed learn perdomain generalizing policies effectively.