Performance Metrics for Reinforcement Learning Algorithms

William Dabney; Philip Thomas; Andrew Barto

RLDM 2013

Performance Metrics for Reinforcement Learning Algorithms

Conference Abstract Accepted abstract Artificial Intelligence · Decision Making · Machine Learning · Reinforcement Learning

PDF Details

Abstract

Due to the continued growth of the field of reinforcement learning (RL), the number of RL al- gorithms has increased to the point where an individual researcher cannot experiment with all of them. To facilitate the decision of which algorithms to invest time in, we, as a field, need methods that thoroughly and accurately describe the performance of algorithms. Two approaches for evaluating RL algorithms are commonly used, neither of which fully accomplishes these goals. The first approach is to manually test each algorithm with a collection of parameter values and report the best results found. This can introduce an unintended bias in an algorithm’s favor because researchers have more insight when selecting parameter values for methods with which they are familiar. The second approach is to perform a large parameter opti- mization for each algorithm and to report the best results found. This approach does not accurately capture the difficulty of finding good parameter values. The fundamental problem with both approaches is that the robustness of the algorithm to its parameter values is ignored. In the first approach this results in biased evaluations. On the other hand, in the second approach it causes only the combination of the RL algorithm and parameter optimization to be evaluated, which allows the parameter optimization to compensate for weaknesses in the RL algorithm. By this standard, directly searching for fixed policies in the parameter op- timization will yield the best algorithm possible. We propose a performance metric for RL algorithms that tells a much larger part of the story of an algorithm’s performance and robustness to its parameter values. It allows us as RL researchers to be better informed about the performance of our algorithms, and to report results that are also more informative to our audiences. The key insight is to measure performance in terms of expected percentage of fixed, deterministic policies that the algorithm outperforms.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Multidisciplinary Conference on Reinforcement Learning and Decision Making
Archive span: 2013-2025
Indexed papers: 1004
Paper id: 185330926185726474