PRL Workshop 2025 Workshop Paper
Controller Synthesis from Deep Reinforcement Learning Policies
- Florent Delgrange
- Guy Avni
- Anna Lukina
- Christian Schilling
- Ann Nowe
- Guillermo Perez
We propose a novel framework to controller design in environments with a two-level structure: a known high-level graph (“map”) in which each vertex is populated by a Markov decision process, called a “room”. The framework “separates concerns” by using different design techniques for lowand high-level tasks. We apply reactive synthesis for highlevel tasks: given a specification as a logical formula over the high-level graph and a collection of low-level policies given on “concise” latent structures, we construct a “planner” that selects which low-level policy to apply in each room. We develop a reinforcement learning procedure to train lowlevel policies on latent structures, which unlike previous approaches, circumvents a model distillation step. It pairs the policy with probably approximately correct guarantees on its performance and abstraction quality, which are lifted to guarantees on the high-level task. These formal guarantees are the main advantage of the framework. Other advantages include scalability (rooms are large and their dynamics is unknown) and reusability of low-level policies. We demonstrate feasibility in challenging case studies involving agent navigation in environments with moving obstacles and visual inputs.