PRL Workshop 2021 Workshop Paper
- Forest Agostinelli
- Stephen McAleer
- Alexander Shmakov
- Roy Fox
- Marco Valtorta
- Biplav Srivastava
- Pierre Baldi
real world applications would ensure that artificial intelligence agents can solve problems in the most efficient way Deep reinforcement learning has been shown to be able to possible, or close to the most efficient way possible, which train deep neural networks to implement effective heuristic could significantly reduce the resources consumed by such functions that can be used with A* search to solve probagents. lems with large state spaces. However, these learned heuristic Obtaining an admissible heuristic function often requires functions are not guaranteed to be admissible. We introduce domain-specific knowledge. For example, pattern databases approximately admissible conversion, an algorithm that can convert any inadmissible heuristic function into a heuristic (PDBs) (Culberson and Schaeffer 1998) have been sucfunction that is admissible in the vast majority of cases with cessful at finding optimal solutions to puzzles such as the no domain-specific heuristic information. We apply approxiRubik’s cube (Korf 1997), 15-puzzle, and 24-puzzle (Korf mately admissible conversion to heuristic functions parameand Felner 2002; Felner, Korf, and Hanan 2004). Howterized by deep neural networks and show that these heuristic ever, ensuring that these PDBs produce admissible heurisfunctions can be used to find optimal solutions, or bounded tics requires knowledge about how the puzzle pieces intersuboptimal solutions, even when doing a batched version of act. There has been previous research on using deep neuA* search. We test our method on the 15-puzzle and 24ral networks to learn heuristic functions (Chen and Wei puzzle and obtain a heuristic function that is empirically ad2011; Wang et al. 2019; Ferber, Helmert, and Hoffmann missible over 99. 99% of the time and that finds optimal so2020) including the DeepCubeA algorithm (McAleer et al. lutions for 100% of all test configurations. To the best of our knowledge, this is the first demonstration that approximately 2019; Agostinelli et al. 2019) which used deep reinforceadmissible heuristics can be obtained using deep neural netment learning and weighted A* search (Pohl 1970) to solve works in a domain independent fashion. the aforementioned puzzles. However, the heuristic functions produced by DeepCubeA are not admissible. In this paper, we define an approximately admissible