Author name cluster

Theresa Eimer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

EWRL Workshop 2025 Workshop Paper

Mighty: A Comprehensive Tool for studying Generalization, Meta-RL and AutoRL

Aditya Mohan
Theresa Eimer
Carolin Benjamins
Marius Lindauer
André Biedenkapp

Robust generalization, rapid adaptation, and automated tuning are essential for deploying reinforcement learning in real-world settings. However, research on these aspects remains scattered across non-standard codebases and custom orchestration scripts. We introduce Mighty, an open-source library that unifies Contextual Generalization, Meta-RL, and AutoRL under a single modular interface. Mighty cleanly separates a configurable Agent - specified by its learning algorithm, model architecture, replay buffer, exploration strategy, and hyperparameters - from a configurable environment modeled as a Contextual MDP in which transitions, rewards, and initial states are governed by context parameters. This design decouples inner‐loop weight updates from outer‐loop adaptations, enabling users to compose, within one framework, (i) contextual generalization and curriculum methods (e. g. \ Unsupervised Environment Design), (ii) bi‐level meta‐learning (e. g. \ MAML, black‐box strategies), and (iii) automated hyperparameter and architecture search (e. g. \ Bayesian optimization, evolutionary strategies, population‐based training). We present Mighty’s design philosophy and core features and validate the ongoing base implementations on classic control and continuous control tasks. We hope that by providing a unified, modular interface, Mighty will simplify experimentation and inspire further advances in robust, adaptable reinforcement learning.

PDF

EWRL Workshop 2025 Workshop Paper

Performance Prediction In Reinforcement Learning: The Bad And The Ugly

Julian Dierkes
Theresa Eimer
Marius Lindauer
Holger Hoos

Reinforcement learning (RL) methods are known to be highly sensitive to their hyperparameter settings and costly to evaluate. In light of this, surrogate models that predict the performance of a given algorithm given a hyperparameter configuration seem an attractive solution for understanding and optimising computationally expensive tasks. In this work, we are studying such surrogates for RL and find that RL methods present a significant challenge to current performance prediction approaches. Specifically, RL landscapes appear to be rugged and noisy, which is reflected in the poor performance of surrogate models. Even if surrogate models are only used for gaining insights into the hyperparameter landscapes and not as replacements for algorithm evaluations in benchmarking, we find that they deviate from the ground truth significantly. Our evaluation highlights the limits of surrogate modelling for RL and cautions against blindly trusting surrogate-based methods for this domain. This calls for more sophisticated solutions for effectively using surrogate models in sequential model-based optimisation of RL hyperparameters.

PDF

EWRL Workshop 2024 Workshop Paper

ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

Jannis Becktepe
Julian Dierkes
Carolin Benjamins
Aditya Mohan
David Salinas
Raghu Rajan
Frank Hutter
Holger Hoos

Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https: //github. com/automl/arlbench.

PDF

TMLR Journal 2024 Journal Article

AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks

Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
Tim Ruhkopf
Sarah Segel
Daphne Theodorakopoulos

The fields of both Natural Language Processing (NLP) and Automated Machine Learning (AutoML) have achieved remarkable results over the past years. In NLP, especially Large Language Models (LLMs) have experienced a rapid series of breakthroughs very recently. We envision that the two fields can radically push the boundaries of each other through tight integration. To showcase this vision, we explore the potential of a symbiotic relationship between AutoML and LLMs, shedding light on how they can benefit each other. In particular, we investigate both the opportunities to enhance AutoML approaches with LLMs from different perspectives and the challenges of leveraging AutoML to further improve LLMs. To this end, we survey existing work, and we critically assess risks. We strongly believe that the integration of the two fields has the potential to disrupt both fields, NLP and AutoML. By highlighting conceivable synergies, but also risks, we aim to foster further exploration at the intersection of AutoML and LLMs.

PDF Details

TMLR Journal 2023 Journal Article

Contextualize Me – The Case for Context in Reinforcement Learning

Carolin Benjamins
Theresa Eimer
Frederik Schubert
Aditya Mohan
Sebastian Döhler
André Biedenkapp
Bodo Rosenhahn
Frank Hutter

While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.

PDF Details

EWRL Workshop 2023 Workshop Paper

Contextualize Me – The Case for Context in Reinforcement Learning

Carolin Benjamins
Theresa Eimer
Frederik Schubert
Aditya Mohan
Sebastian Döhler
André Biedenkapp
Bodo Rosenhahn
Frank Hutter

While Reinforcement Learning (RL) has shown successes in a variety of domains, including game playing, robot manipulation and nuclear fusion, modern RL algorithms are not designed with generalization in mind, making them brittle when faced with even slight variations of their environment. To address this limitation, recent research has increasingly focused on the generalization capabilities of RL agents. Ideally, general agents should be capable of zero-shot transfer to previously unseen environments and robust to changes in the problem setting while interacting with an environment. Steps in this direction have been taken by proposing new problem settings where agents can test their transfer performance, e. g. ~the Arcade Learning Environment's flavors or benchmarks utilizing Procedural Content Generation (PCG) to increase task variation, e. g. ProcGen, NetHack or Alchemy. While these extended problem settings in RL have expanded the possibilities for benchmarking agents in diverse environments, the degree of task variation is often either unknown or cannot be controlled precisely. We believe that generalization in RL is held back by these factors, stemming in part from a lack of problem formalization. In order to facilitate generalization in RL, contextual RL (cRL) proposes to explicitly take environment characteristics, the so-called context into account. This inclusion enables precise design of train and test distributions with respect to this context. Thus, cRL allows us to reason about the generalization capabilities of RL agents and to quantify their generalization performance. Overall, cRL provides a framework for both theoretical analysis and practical improvements. In order to empirically study cRL, we introduce our benchmark library CARL, short for Context-Adaptive Reinforcement Learning. CARL collects well-established environments from the RL community and extends them with the notion of context. We use our benchmark library to empirically show how different context variations can drastically increase the difficulty of training RL agents, even in simple environments. We further verify the intuition that allowing RL agents access to context information is beneficial for generalization tasks in theory and practice.

PDF

EWRL Workshop 2023 Workshop Paper

Hyperparameters in Reinforcement Learning and How To Tune Them

Theresa Eimer
Marius Lindauer
Roberta Raileanu

Deep Reinforcement Learning (RL) has been adopting better scientific practices in order to improve reproducibility such as standardized evaluation metrics and reporting as well as greater attention to implementation details and design decisions. However, the process of hyperparameter optimization still varies widely across papers with inefficient grid searches being most commonly used. This makes fair comparisons between RL algorithms challenging. In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which might lead to overfitting to single seeds. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community going forward, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress in RL. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https: //anonymous. 4open. science/r/how-to-autorl-DE67/README. md.

PDF

ICML Conference 2023 Conference Paper

Hyperparameters in Reinforcement Learning and How To Tune Them

Theresa Eimer
Marius Lindauer
Roberta Raileanu

In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https: //github. com/facebookresearch/how-to-autorl.

Details

JAIR Journal 2022 Journal Article

Automated Dynamic Algorithm Configuration

Steven Adriaensen
André Biedenkapp
Gresa Shala
Noor Awad
Theresa Eimer
Marius Lindauer
Frank Hutter

The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior art to tackle this problem; and (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning.

PDF Details DOI

JAIR Journal 2022 Journal Article

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Jack Parker-Holder
Raghu Rajan
Xingyou Song
André Biedenkapp
Yingjie Miao
Theresa Eimer
Baohe Zhang
Vu Nguyen

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems and also limits its full potential. In many other areas of machine learning, AutoML has shown that it is possible to automate such design choices, and AutoML has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games, such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey, we seek to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

DACBench: A Benchmark Library for Dynamic Algorithm Configuration

Theresa Eimer
André Biedenkapp
Maximilian Reimer
Steven Adriansen
Frank Hutter
Marius Lindauer

Dynamic Algorithm Configuration (DAC) aims to dynamically control a target algorithm's hyperparameters in order to improve its performance. Several theoretical and empirical results have demonstrated the benefits of dynamically controlling hyperparameters in domains like evolutionary computation, AI Planning or deep learning. Replicating these results, as well as studying new methods for DAC, however, is difficult since existing benchmarks are often specialized and incompatible with the same interfaces. To facilitate benchmarking and thus research on DAC, we propose DACBench, a benchmark library that seeks to collect and standardize existing DAC benchmarks from different AI domains, as well as provide a template for new ones. For the design of DACBench, we focused on important desiderata, such as (i) flexibility, (ii) reproducibility, (iii) extensibility and (iv) automatic documentation and visualization. To show the potential, broad applicability and challenges of DAC, we explore how a set of six initial benchmarks compare in several dimensions of difficulty.

PDF Details DOI

ICML Conference 2021 Conference Paper

Self-Paced Context Evaluation for Contextual Reinforcement Learning

Theresa Eimer
André Biedenkapp
Frank Hutter
Marius Lindauer

Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPaCE). Based on self-paced learning, SPaCE automatically generates instance curricula online with little computational overhead. To this end, SPaCE leverages information contained in state values during training to accelerate and improve training performance as well as generalization capabilities to new \tasks from the same problem domain. Nevertheless, SPaCE is independent of the problem domain at hand and can be applied on top of any RL agent with state-value function approximation. We demonstrate SPaCE’s ability to speed up learning of different value-based RL agents on two environments, showing better generalization capabilities and up to 10x faster learning compared to naive approaches such as round robin or SPDRL, as the closest state-of-the-art approach.

Details

ECAI Conference 2020 Conference Paper

Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework

André Biedenkapp
H. Furkan Bozkurt
Theresa Eimer
Frank Hutter
Marius Lindauer

The performance of many algorithms in the fields of hard combinatorial problem solving, machine learning or AI in general depends on parameter tuning. Automated methods have been proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations across a set of problem instances. However, there is still a lot of untapped potential through adjusting an algorithm’s parameters online since different parameter values can be optimal at different stages of the algorithm. Prior work showed that reinforcement learning is an effective approach to learn policies for online adjustments of algorithm parameters in a data-driven way. We extend that approach by formulating the resulting dynamic algorithm configuration as a contextual MDP, such that RL not only learns a policy for a single instance, but across a set of instances. To lay the foundation for studying dynamic algorithm configuration with RL in a controlled setting, we propose white-box benchmarks covering major aspects that make dynamic algorithm configuration a hard problem in practice and study the performance of various types of configuration strategies for them. On these white-box benchmarks, we show that (i) RL is a robust candidate for learning configuration policies, outperforming standard parameter optimization approaches, such as classical algorithm configuration; (ii) based on function approximation, RL agents can learn to generalize to new types of instances; and (iii) self-paced learning can substantially improve the performance by selecting a useful sequence of training instances automatically.

Details