Arrow Research search

Author name cluster

Frank Hutter

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

114 papers
2 author rows

Possible papers

114

ICML Conference 2025 Conference Paper

Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

  • Dongwoo Lee
  • Dong Bok Lee
  • Steven Adriaensen
  • Juho Lee
  • Sung Ju Hwang
  • Frank Hutter
  • Seon Joo Kim
  • Hae Beom Lee

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.

ICLR Conference 2025 Conference Paper

Beyond Random Augmentations: Pretraining with Hard Views

  • Fabio Ferreira
  • Ivo Rapant
  • Jörg K. H. Franke
  • Frank Hutter

Self-Supervised Learning (SSL) methods typically rely on random image augmentations, or views, to make models invariant to different transformations. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple yet effective approach is to select hard views that yield a higher loss. In this paper, we propose Hard View Pretraining (HVP), a learning-free strategy that extends random view generation by exposing models to more challenging samples during SSL pretraining. HVP encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss according to the current model state, and 4) perform a backward pass with the selected pair. In contrast to existing hard view literature, we are the first to demonstrate hard view pretraining's effectiveness at scale, particularly training on the full ImageNet-1k dataset, and evaluating across multiple SSL methods, Convolutional Networks, and Vision Transformers. As a result, HVP sets a new state-of-the-art on DINO ViT-B/16, reaching 78.8% linear evaluation accuracy (a 0.6% improvement) and consistent gains of 1% for both 100 and 300 epoch pretraining, with similar improvements across transfer tasks in DINO, SimSiam, iBOT, and SimCLR.

JMLR Journal 2025 Journal Article

DeepCAVE: A Visualization and Analysis Tool for Automated Machine Learning

  • Sarah Segel
  • Helena Graf
  • Edward Bergman
  • Kristina Thieme
  • Marcel Wever
  • Alexander Tornede
  • Frank Hutter
  • Marius Lindauer

Hyperparameter optimization (HPO), as a central paradigm of AutoML, is crucial for leveraging the full potential of machine learning (ML) models; yet its complexity poses challenges in understanding and debugging the optimization process. We present DeepCAVE, a tool for interactive visualization and analysis, providing insights into HPO. Through an interactive dashboard, researchers, data scientists, and ML engineers can explore various aspects of the HPO process and identify issues, untouched potentials, and new insights about the ML model being tuned. By empowering users with actionable insights, DeepCAVE contributes to the interpretability of HPO and ML on a design level and aims to foster the development of more robust and efficient methodologies in the future. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

NeurIPS Conference 2025 Conference Paper

DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

  • Julien Siems
  • Timur Carstensen
  • Arber Zela
  • Frank Hutter
  • Massimiliano Pontil
  • Riccardo Grazzi

Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or mLSTM, yield fast runtime but have limited expressivity. To address this, recent architectures such as DeltaNet and RWKV-7 adopted a diagonal plus rank-1 structure, which allows simultaneous token and channel mixing, improving associative recall and, as recently shown, state-tracking when allowing negative eigenvalues in the state-transition matrices. Building on the interpretation of DeltaNet's recurrence as performing one step of online gradient descent per token on an associative recall loss, we introduce DeltaProduct, which instead takes multiple ($n_h$) steps per token. This naturally leads to diagonal plus rank-$n_h$ state-transition matrices, formed as products of $n_h$ generalized Householder transformations, providing a tunable mechanism to balance expressivity and efficiency. We provide a detailed theoretical characterization of the state-tracking capability of DeltaProduct in finite precision and how it improves by increasing $n_h$. Our extensive experiments demonstrate that DeltaProduct outperforms DeltaNet in both state-tracking and language modeling, while also showing significantly improved length extrapolation capabilities.

ICLR Conference 2025 Conference Paper

Diffusion-based Neural Network Weights Generation

  • Bedionita Soro
  • Bruno Andreis
  • Hayeon Lee
  • Wonyong Jeong
  • Song Chong
  • Frank Hutter
  • Sung Ju Hwang

Transfer learning is a cornerstone of modern deep learning, yet it remains constrained by challenges in model selection and the overhead of extensive model storage. In this work, we present Diffusion-based Neural Network Weights Generation, D2NWG, a novel framework that leverages diffusion processes to synthesize task-specific network weights. By modeling the distribution of weights from a diverse ensemble of pretrained models and conditioning the generation process on dataset characteristics, task descriptions, and architectural specifications, D2NWG circumvents the need for storing and searching through massive model repositories. We evaluate D2NWG across multiple experimental settings. On in-distribution tasks, our framework achieves performance that is on par with or superior to conventional pretrained models, while also serving as an effective initialization strategy for novel domains, resulting in faster convergence and a 6\% improvement in few-shot learning scenarios. Extensive ablation studies further indicate that our approach scales robustly with increased diversity and volume of pretrained models. Moreover, D2NWG demonstrates significant promise for large language model applications. In evaluations on the OpenLM leaderboard, our method improved LLaMA-3-2-1B-Instruct performance by 3\% on challenging mathematical reasoning tasks, with a consistent gain of 0.36\% across a range of benchmarks. These findings establish D2NWG as a versatile and powerful framework for neural network weight generation, offering a scalable solution to the limitations of traditional transfer learning.

NeurIPS Conference 2025 Conference Paper

Do-PFN: In-Context Learning for Causal Effect Estimation

  • Jake Robertson
  • Arik Reuter
  • Siyuan Guo
  • Noah Hollmann
  • Frank Hutter
  • Bernhard Schölkopf

Causal effect estimation is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground-truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-the-art predictive performance, having been pre-trained on synthetic causal data to solve tabular prediction problems via in-context learning. To assess whether this can be transferred to the problem of causal effect estimation, we pre-train PFNs on synthetic data drawn from a wide variety of causal structures, including interventions, to predict interventional outcomes given observational data. Through extensive experiments in synthetic and semi-synthetic settings, we show that our approach allows for the accurate estimation of causal effects without knowledge of the underlying causal graph.

NeurIPS Conference 2025 Conference Paper

EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Network

  • Michael Arbel
  • David Salinas
  • Frank Hutter

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning but remain constrained to a fixed, pre-defined number of target dimensions—often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target-equivariance, so that permuting target-dimension orderings alters their predictions. This deficiency gives rise to an irreducible “equivariance gap, ” an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture—ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

ICML Conference 2025 Conference Paper

FairPFN: A Tabular Foundation Model for Causal Fairness

  • Jake Robertson
  • Noah Hollmann
  • Samuel Müller 0005
  • Noor H. Awad
  • Frank Hutter

Machine learning (ML) systems are utilized in critical sectors such as healthcare, law enforcement, and finance, but often rely on historical data that contains demographic biases, leading to decisions that perpetuate or intensify existing inequalities. Causal and counterfactual fairness provide a transparent, human-in-the-loop framework to mitigate algorithmic discrimination, aligning closely with legal doctrines of direct and indirect discrimination. However, current causal fairness frameworks hold a key limitation in that they assume prior knowledge of the correct causal model, restricting their applicability in complex fairness scenarios where causal models are unknown or difficult to identify. To bridge this gap, we propose FairPFN, a tabular foundation model pre-trained on synthetic causal fairness data to identify and mitigate the causal effects of protected attributes in its predictions. FairPFN’s key contribution is that it requires no knowledge of the causal model and demonstrates strong performance across a diverse set of hand-crafted and real-world causal scenarios relative to robust baseline methods. FairPFN paves the way for a promising direction for future research, making causal fairness more accessible to a wider variety of complex fairness problems.

NeurIPS Conference 2025 Conference Paper

Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics

  • Indrashis Das
  • Mahmoud Safari
  • Steven Adriaensen
  • Frank Hutter

Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as LeakyReLU, PReLU, and ELU that better handle negative neuron outputs. Recently, self-gated activations like GELU and Swish have emerged as state-of-the-art alternatives, leveraging their smoothness to ensure stable gradient flow and prevent neuron inactivity. In this work, we introduce the Gompertz Linear Unit (GoLU), a novel self-gated activation function defined as $\mathrm{GoLU}(x) = x \\, \mathrm{Gompertz}(x)$, where $\mathrm{Gompertz}(x) = e^{-e^{-x}}$. The GoLU activation leverages the right-skewed asymmetry in the Gompertz function to reduce variance in the latent space more effectively compared to GELU and Swish, while preserving robust gradient flow. Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight GoLU's superior performance relative to state-of-the-art activation functions, establishing GoLU as a robust alternative to existing activation functions.

ICLR Conference 2025 Conference Paper

KinPFN: Bayesian Approximation of RNA Folding Kinetics using Prior-Data Fitted Networks

  • Dominik Scheuer
  • Frederic Runge
  • Jörg K. H. Franke
  • Michael T. Wolfinger
  • Christoph Flamm
  • Frank Hutter

RNA is a dynamic biomolecule crucial for cellular regulation, with its function largely determined by its folding into complex structures, while misfolding can lead to multifaceted biological sequelae. During the folding process, RNA traverses through a series of intermediate structural states, with each transition occurring at variable rates that collectively influence the time required to reach the functional form. Understanding these folding kinetics is vital for predicting RNA behavior and optimizing applications in synthetic biology and drug discovery. While in silico kinetic RNA folding simulators are often computationally intensive and time-consuming, accurate approximations of the folding times can already be very informative to assess the efficiency of the folding process. In this work, we present KinPFN, a novel approach that leverages prior-data fitted networks to directly model the posterior predictive distribution of RNA folding times. By training on synthetic data representing arbitrary prior folding times, KinPFN efficiently approximates the cumulative distribution function of RNA folding times in a single forward pass, given only a few initial folding time examples. Our method offers a modular extension to existing RNA kinetics algorithms, promising significant computational speed-ups orders of magnitude faster, while achieving comparable results. We showcase the effectiveness of KinPFN through extensive evaluations and real-world case studies, demonstrating its potential for RNA folding kinetics analysis, its practical relevance, and generalization to other biological data.

NeurIPS Conference 2025 Conference Paper

Learning in Compact Spaces with Approximately Normalized Transformer

  • Jörg Franke
  • Urs Spiegelhalter
  • Marianna Nezhurina
  • Jenia Jitsev
  • Frank Hutter
  • Michael Hefenbrock

The successful training of deep neural networks requires addressing challenges such as overfitting, numerical instabilities leading to divergence, and increasing variance in the residual stream. A common solution is to apply regularization and normalization techniques that usually require tuning additional hyperparameters. An alternative is to force all parameters and representations to lie on a hypersphere. This removes the need for regularization and increases convergence speed, but comes with additional costs. In this work, we propose a more holistic, approximate normalization via simple scalar multiplications motivated by the tight concentration of the norms of high-dimensional random vectors. Additionally, instead of applying strict normalization for the parameters, we constrain their norms. These modifications remove the need for weight decay and learning rate warm-up as well, but do not increase the total number of normalization layers. Our experiments with transformer architectures show up to 40% faster convergence compared to GPT models with QK normalization, with only 3% additional runtime cost. When deriving scaling laws, we found that our method enables training with larger batch sizes while preserving the favorable scaling characteristics of classic GPT architectures.

TMLR Journal 2025 Journal Article

Meta-learning Population-based Methods for Reinforcement Learning

  • Johannes Hog
  • Raghu Rajan
  • André Biedenkapp
  • Noor Awad
  • Frank Hutter
  • Vu Nguyen

Reinforcement learning (RL) algorithms are highly sensitive to their hyperparameter settings. Recently, numerous methods have been proposed to dynamically optimize these hyperparameters. One prominent approach is Population-Based Bandits (PB2), which uses time-varying Gaussian processes (GP) to dynamically optimize hyperparameters with a population of parallel agents. Despite its strong overall performance, PB2 experiences slow starts due to the GP initially lacking sufficient information. To mitigate this issue, we propose four different methods that utilize meta-data from various environments. These approaches are novel in that they adapt meta-learning methods to accommodate the time-varying setting. Among these approaches, MultiTaskPB2, which uses meta-learning for the surrogate model, stands out as the most promising approach. It outperforms PB2 and other baselines in both anytime and final performance across two RL environment families.

ICLR Conference 2025 Conference Paper

Multi-objective Differentiable Neural Architecture Search

  • Rhea Sanjay Sukthanker
  • Arber Zela
  • Benedikt Staffler
  • Samuel Dooley
  • Josif Grabocka
  • Frank Hutter

Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.

NeurIPS Conference 2025 Conference Paper

TabArena: A Living Benchmark for Machine Learning on Tabular Data

  • Nick Erickson
  • Lennart Purucker
  • Andrej Tschalzev
  • David Holzmüller
  • Prateek Desai
  • David Salinas
  • Frank Hutter

With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning. We observe that some deep learning models are overrepresented in cross-model ensembles due to validation set overfitting, and we encourage model developers to address this issue. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https: //tabarena. ai.

ICML Conference 2025 Conference Paper

Tuning LLM Judge Design Decisions for 1/1000 of the Cost

  • David Salinas
  • Omar Swelam
  • Frank Hutter

Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters are typically changed at the same time making apple-to-apple comparisons challenging. In this paper, we propose to systematically analyze and tune the hyperparameters of LLM judges. To alleviate the high cost of evaluating a judge, we propose to leverage multi-objective multi-fidelity which allows to find judges that trades accuracy for cost and also reduce significantly the cost of the search. Our method identifies judges that not only outperform existing benchmarks in accuracy and cost-efficiency but also utilize open-weight models, ensuring greater accessibility and reproducibility.

ICLR Conference 2025 Conference Paper

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

  • Riccardo Grazzi
  • Julien Siems
  • Arber Zela
  • Jörg K. H. Franke
  • Frank Hutter
  • Massimiliano Pontil

Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which may impair performance in tasks such as code evaluation. In one forward pass, current architectures are unable to solve even parity, the simplest state-tracking task, which non-linear RNNs can handle effectively. Recently, Sarrof et al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity stems from restricting the value range of their diagonal state-transition matrices to $[0, 1]$ and that incorporating negative values can resolve this issue. We extend this result to non-diagonal LRNNs such as DeltaNet. We prove that finite precision LRNNs with state-transition matrices having only positive eigenvalues cannot solve parity, while non-triangular matrices are needed to count modulo $3$. Notably, we also prove that LRNNs can learn any regular language when their state-transition matrices are products of identity minus vector outer product matrices, each with eigenvalues in the range $[-1, 1]$. Our experiments confirm that extending the eigenvalue range of Mamba and DeltaNet to include negative values not only enables them to solve parity but consistently improves their performance on state-tracking tasks. We also show that state-tracking enabled LRNNs can be pretrained stably and efficiently at scale (1.3B parameters), achieving competitive performance on language modeling and showing promise on code and math tasks.

ICLR Conference 2024 Conference Paper

A General Framework for User-Guided Bayesian Optimization

  • Carl Hvarfner
  • Frank Hutter
  • Luigi Nardi

The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.

EWRL Workshop 2024 Workshop Paper

ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

  • Jannis Becktepe
  • Julian Dierkes
  • Carolin Benjamins
  • Aditya Mohan
  • David Salinas
  • Raghu Rajan
  • Frank Hutter
  • Holger Hoos

Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at https: //github. com/automl/arlbench.

JAIR Journal 2024 Journal Article

Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

  • Hilde Weerts
  • Florian Pfisterer
  • Matthias Feurer
  • Katharina Eggensperger
  • Edward Bergman
  • Noor Awad
  • Joaquin Vanschoren
  • Mykola Pechenizkiy

The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to propose AutoML systems that jointly optimize fairness and predictive performance to mitigate fairness-related harm. However, fairness is a complex and inherently interdisciplinary subject, and solely posing it as an optimization problem can have adverse side effects. With this work, we aim to raise awareness among developers of AutoML systems about such limitations of fairness-aware AutoML, while also calling attention to the potential of AutoML as a tool for fairness research. We present a comprehensive overview of different ways in which fairness-related harm can arise and the ensuing implications for the design of fairness-aware AutoML. We conclude that while fairness cannot be automated, fairness-aware AutoML can play an important role in the toolbox of ML practitioners. We highlight several open technical challenges for future work in this direction. Additionally, we advocate for the creation of more user-centered assistive systems designed to tackle challenges encountered in fairness work. This article appears in the AI & Society track.

NeurIPS Conference 2024 Conference Paper

Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

  • Kai Helli
  • David Schnurr
  • Noah Hollmann
  • Samuel Müller
  • Frank Hutter

While most ML models expect independent and identically distributed data, this assumption is often violated in real-world scenarios due to distribution shifts, resulting in the degradation of machine learning model performance. Until now, no tabular method has consistently outperformed classical supervised learning, which ignores these shifts. To address temporal distribution shifts, we present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network that learns the learning algorithm itself: it accepts the entire training dataset as input and makes predictions on the test set in a single forward pass. Specifically, it learns to approximate Bayesian inference on synthetic datasets drawn from a prior that specifies the model's inductive bias. This prior is based on structural causal models (SCM), which gradually shift over time. To model shifts of these causal models, we use a secondary SCM, that specifies changes in the primary model parameters. The resulting Drift-Resilient TabPFN can be applied to unseen data, runs in seconds on small to moderately sized datasets and needs no hyperparameter tuning. Comprehensive evaluations across 18 synthetic and real-world datasets demonstrate large performance improvements over a wide range of baselines, such as XGB, CatBoost, TabPFN, and applicable methods featured in the Wild-Time benchmark. Compared to the strongest baselines, it improves accuracy from 0. 688 to 0. 744 and ROC AUC from 0. 786 to 0. 832 while maintaining stronger calibration. This approach could serve as significant groundwork for further research on out-of-distribution prediction.

NeurIPS Conference 2024 Conference Paper

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

  • Rhea S. Sukthanker
  • Arber Zela
  • Benedikt Staffler
  • Aaron Klein
  • Lennart Purucker
  • Jörg K. Franke
  • Frank Hutter

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 1. 55B parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.

NeurIPS Conference 2024 Conference Paper

Improving Deep Learning Optimization through Constrained Parameter Regularization

  • Jörg K. Franke
  • Michael Hefenbrock
  • Gregor Koehler
  • Frank Hutter

Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform application of a single penalty, CPR enforces an upper bound on a statistical measure, such as the L$_2$-norm, of individual parameter matrices. Consequently, learning becomes a constraint optimization problem, which we tackle using an adaptation of the augmented Lagrangian method. CPR introduces only a minor runtime overhead and only requires setting an upper bound. We propose simple yet efficient mechanisms for initializing this bound, making CPR rely on no hyperparameter or one, akin to weight decay. Our empirical studies on computer vision and language modeling tasks demonstrate CPR's effectiveness. The results show that CPR can outperform traditional weight decay and increase performance in pre-training and fine-tuning.

ICML Conference 2024 Conference Paper

In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

  • Herilalaina Rakotoarison
  • Steven Adriaensen
  • Neeratyoy Mallik
  • Samir Garibov
  • Eddie Bergman
  • Frank Hutter

With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers’ in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work. Furthermore, we show that, when combined with our novel acquisition mechanism (MFPI-random), the resulting in-context freeze-thaw BO method (ifBO), yields new state-of-the-art performance in the same three families of deep learning HPO benchmarks considered in prior work.

ICML Conference 2024 Conference Paper

Position: A Call to Action for a Human-Centered AutoML Paradigm

  • Marius Lindauer
  • Florian Karl
  • Anne Klier
  • Julia Moosbauer
  • Alexander Tornede
  • Andreas Müller 0024
  • Frank Hutter
  • Matthias Feurer 0001

Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML’s full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies.

ICLR Conference 2024 Conference Paper

Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

  • Sebastian Pineda-Arango
  • Fabio Ferreira
  • Arlind Kadra
  • Frank Hutter
  • Josif Grabocka

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a gray-box performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.

ICML Conference 2024 Conference Paper

Surprisingly Strong Performance Prediction with Neural Graph Features

  • Gabriela Kadlecová
  • Jovita Lukasik
  • Martin Pilát
  • Petra Vidnerová
  • Mahmoud Safari
  • Roman Neruda
  • Frank Hutter

Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimate network performance without any training. However, they are still poorly understood, exhibit biases with network properties, and their performance is limited. Inspired by the drawbacks of zero-cost proxies, we propose neural graph features (GRAF), simple to compute properties of architectural graphs. GRAF offers fast and interpretable performance prediction while outperforming zero-cost proxies and other common encodings. In combination with other zero-cost proxies, GRAF outperforms most existing performance predictors at a fraction of the cost.

NeurIPS Conference 2024 Conference Paper

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

  • Benjamin Feuer
  • Robin T. Schirrmeister
  • Valeriia Cherepanova
  • Chinmay Hegde
  • Frank Hutter
  • Micah Goldblum
  • Niv Cohen
  • Colin White

While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context. We conduct extensive experiments on nineteen algorithms over 98 datasets and find that TuneTables achieves the best performance on average, outperforming boosted trees such as CatBoost, while optimizing fewer than 5\% of TabPFN's parameters. Furthermore, we show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.

IJCAI Conference 2023 Conference Paper

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

  • Shuhei Watanabe
  • Frank Hutter

Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on 81 expensive HPO with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. See https: //arxiv. org/abs/2211. 14411 for the latest version with Appendix.

NeurIPS Conference 2023 Conference Paper

Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

  • Simon Schrodi
  • Danny Stoll
  • Binxin Ru
  • Rhea Sukthanker
  • Thomas Brox
  • Frank Hutter

The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-free grammars that can naturally and compactly generate expressive hierarchical search spaces that are 100s of orders of magnitude larger than common spaces from the literature. By enhancing and using their properties, we effectively enable search over the complete architecture and can foster regularity. Further, we propose an efficient hierarchical kernel design for a Bayesian Optimization search strategy to efficiently search over such huge spaces. We demonstrate the versatility of our search space design framework and show that our search strategy can be superior to existing NAS approaches. Code is available at https: //github. com/automl/hierarchical nas construction.

TMLR Journal 2023 Journal Article

Contextualize Me – The Case for Context in Reinforcement Learning

  • Carolin Benjamins
  • Theresa Eimer
  • Frederik Schubert
  • Aditya Mohan
  • Sebastian Döhler
  • André Biedenkapp
  • Bodo Rosenhahn
  • Frank Hutter

While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how the framework of cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks. We confirm the insight that optimal behavior in cRL requires context information, as in other related areas of partial observability. To empirically validate this in the cRL framework, we provide various context-extended versions of common RL environments. They are part of the first benchmark library, CARL, designed for generalization based on cRL extensions of popular benchmarks, which we propose as a testbed to further study general agents. We show that in the contextual setting, even simple RL environments become challenging - and that naive solutions are not enough to generalize across complex context spaces.

EWRL Workshop 2023 Workshop Paper

Contextualize Me – The Case for Context in Reinforcement Learning

  • Carolin Benjamins
  • Theresa Eimer
  • Frederik Schubert
  • Aditya Mohan
  • Sebastian Döhler
  • André Biedenkapp
  • Bodo Rosenhahn
  • Frank Hutter

While Reinforcement Learning (RL) has shown successes in a variety of domains, including game playing, robot manipulation and nuclear fusion, modern RL algorithms are not designed with generalization in mind, making them brittle when faced with even slight variations of their environment. To address this limitation, recent research has increasingly focused on the generalization capabilities of RL agents. Ideally, general agents should be capable of zero-shot transfer to previously unseen environments and robust to changes in the problem setting while interacting with an environment. Steps in this direction have been taken by proposing new problem settings where agents can test their transfer performance, e. g. ~the Arcade Learning Environment's flavors or benchmarks utilizing Procedural Content Generation (PCG) to increase task variation, e. g. ProcGen, NetHack or Alchemy. While these extended problem settings in RL have expanded the possibilities for benchmarking agents in diverse environments, the degree of task variation is often either unknown or cannot be controlled precisely. We believe that generalization in RL is held back by these factors, stemming in part from a lack of problem formalization. In order to facilitate generalization in RL, contextual RL (cRL) proposes to explicitly take environment characteristics, the so-called context into account. This inclusion enables precise design of train and test distributions with respect to this context. Thus, cRL allows us to reason about the generalization capabilities of RL agents and to quantify their generalization performance. Overall, cRL provides a framework for both theoretical analysis and practical improvements. In order to empirically study cRL, we introduce our benchmark library CARL, short for Context-Adaptive Reinforcement Learning. CARL collects well-established environments from the RL community and extends them with the notion of context. We use our benchmark library to empirically show how different context variations can drastically increase the difficulty of training RL agents, even in simple environments. We further verify the intuition that allowing RL agents access to context information is beneficial for generalization tasks in theory and practice.

NeurIPS Conference 2023 Conference Paper

Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

  • Steven Adriaensen
  • Herilalaina Rakotoarison
  • Samuel Müller
  • Frank Hutter

Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.

ICLR Conference 2023 Conference Paper

Gray-Box Gaussian Processes for Automated Reinforcement Learning

  • Gresa Shala
  • André Biedenkapp
  • Frank Hutter
  • Josif Grabocka

Despite having achieved spectacular milestones in an array of important real-world applications, most Reinforcement Learning (RL) methods are very brittle concerning their hyperparameters. Notwithstanding the crucial importance of setting the hyperparameters in training state-of-the-art agents, the task of hyperparameter optimization (HPO) in RL is understudied. In this paper, we propose a novel gray-box Bayesian Optimization technique for HPO in RL, that enriches Gaussian Processes with reward curve estimations based on generalized logistic functions. In a very large-scale experimental protocol, comprising 5 popular RL methods (DDPG, A2C, PPO, SAC, TD3), dozens of environments (Atari, Mujoco), and 7 HPO baselines, we demonstrate that our method significantly outperforms current HPO practices in RL.

NeurIPS Conference 2023 Conference Paper

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

  • Noah Hollmann
  • Samuel Müller
  • Frank Hutter

As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets -- boosting mean ROC AUC performance from 0. 798 to 0. 822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our code, a simple demo and a python package.

TMLR Journal 2023 Journal Article

MASIF: Meta-learned Algorithm Selection using Implicit Fidelity Information

  • Tim Ruhkopf
  • Aditya Mohan
  • Difan Deng
  • Alexander Tornede
  • Frank Hutter
  • Marius Lindauer

Selecting a well-performing algorithm for a given task or dataset can be time-consuming and tedious, but is crucial for the successful day-to-day business of developing new AI & ML applications. Algorithm Selection (AS) mitigates this through a meta-model leveraging meta-information about previous tasks. However, most of the available AS methods are error-prone because they characterize a task by either cheap-to-compute properties of the dataset or evaluations of cheap proxy algorithms, called landmarks. In this work, we extend the classical AS data setup to include multi-fidelity information and empirically demonstrate how meta-learning on algorithms’ learning behaviour allows us to exploit cheap test-time evidence effectively and combat myopia significantly. We further postulate a budget-regret trade-off w.r.t. the selection process. Our new selector MASIF is able to jointly interpret online evidence on a task in form of varying-length learning curves without any parametric assumption by leveraging a transformer-based encoder. This opens up new possibilities for guided rapid prototyping in data science on cheaply observed partial learning curves.

JAIR Journal 2023 Journal Article

MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning

  • Raghu Rajan
  • Jessica Lizeth Borja Diaz
  • Suresh Guttikonda
  • Fabio Ferreira
  • André Biedenkapp
  • Jan Ole von Hartz
  • Frank Hutter

We present MDP Playground, a testbed for Reinforcement Learning (RL) agents with dimensions of hardness that can be controlled independently to challenge agents in different ways and obtain varying degrees of hardness in toy and complex RL environments. We consider and allow control over a wide variety of dimensions, including delayed rewards, sequence lengths, reward density, stochasticity, image representations, irrelevant features, time unit, action range and more. We define a parameterised collection of fast-to-run toy environments in OpenAI Gym by varying these dimensions and propose to use these to understand agents better. We then show how to design experiments using MDP Playground to gain insights on the toy environments. We also provide wrappers that can inject many of these dimensions into any Gym environment. We experiment with these wrappers on Atari and Mujoco to allow for understanding the effects of these dimensions on environments that are more complex than the toy environments. We also compare the effect of the dimensions on the toy and complex environments. Finally, we show how to use MDP Playground to debug agents, to study the interaction of multiple dimensions and describe further use-cases.

IJCAI Conference 2023 Conference Paper

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

  • Shuhei Watanabe
  • Archit Bansal
  • Frank Hutter

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e. g. , with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient. See https: //arxiv. org/abs/2304. 10255 for the latest version with Appendix.

ICML Conference 2023 Conference Paper

PFNs4BO: In-Context Learning for Bayesian Optimization

  • Samuel Müller 0005
  • Matthias Feurer 0001
  • Noah Hollmann
  • Frank Hutter

In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at https: //github. com/automl/PFNs4BO.

NeurIPS Conference 2023 Conference Paper

PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

  • Neeratyoy Mallik
  • Edward Bergman
  • Carl Hvarfner
  • Danny Stoll
  • Maciej Janowski
  • Marius Lindauer
  • Luigi Nardi
  • Frank Hutter

Hyperparameters of Deep Learning (DL) pipelines are crucial for their downstream performance. While a large number of methods for Hyperparameter Optimization (HPO) have been developed, their incurred costs are often untenable for modern DL. Consequently, manual experimentation is still the most prevalent approach to optimize hyperparameters, relying on the researcher's intuition, domain knowledge, and cheap preliminary explorations. To resolve this misalignment between HPO algorithms and DL researchers, we propose PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency across a range of DL benchmarks and show its gains under informative expert input and robustness against poor expert beliefs.

NeurIPS Conference 2023 Conference Paper

Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

  • Samuel Dooley
  • Rhea Sukthanker
  • John Dickerson
  • Colin White
  • Frank Hutter
  • Micah Goldblum

Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penalties to prevent bias from effecting the model during training, or post-processing predictions to debias them, yet these approaches have shown limited success on hard problems such as face recognition. In our work, we discover that biases are actually inherent to neural network architectures themselves. Following this reframing, we conduct the first neural architecture search for fairness, jointly with a search for hyperparameters. Our search outputs a suite of models which Pareto-dominate all other high-performance architectures and existing bias mitigation methods in terms of accuracy and fairness, often by large margins, on the two most widely used datasets for face identification, CelebA and VGGFace2. Furthermore, these models generalize to other datasets and sensitive attributes. We release our code, models and raw data files at https: //github. com/dooleys/FR-NAS.

NeurIPS Conference 2023 Conference Paper

Self-Correcting Bayesian Optimization through Bayesian Active Learning

  • Carl Hvarfner
  • Erik Hellsten
  • Frank Hutter
  • Luigi Nardi

Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize hyperparameter learning. Statistical distance-based Active Learning (SAL) considers the average disagreement between samples from the posterior, as measured by a statistical distance. SAL outperforms the state-of-the-art in Bayesian active learning on several test functions. We then introduce Self-Correcting Bayesian Optimization (SCoreBO), which extends SAL to perform Bayesian optimization and active learning simultaneously. SCoreBO learns the model hyperparameters at improved rates compared to vanilla BO, while outperforming the latest Bayesian optimization methods on traditional benchmarks. Moreover, we demonstrate the importance of self-correction on atypical Bayesian optimization tasks.

IJCAI Conference 2023 Conference Paper

Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

  • Shuhei Watanabe
  • Noor Awad
  • Masaki Onishi
  • Frank Hutter

Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE’s acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on “Multiobjective Hyperparameter Optimization for Transformers”. See https: //arxiv. org/abs/2212. 06751 for the latest version with Appendix.

ICLR Conference 2023 Conference Paper

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

  • Noah Hollmann
  • Samuel Müller 0005
  • Katharina Eggensperger
  • Frank Hutter

We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures. On the $18$ datasets in the OpenML-CC18 suite that contain up to 1000 training data points, up to 100 purely numerical features without missing values, and up to 10 classes, we show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to $230\times$ speedup. This increases to a $5\,700\times$ speedup when using a GPU. We also validate these results on an additional 67 small numerical datasets from OpenML. We provide all our code, the trained TabPFN, an interactive browser demo and a Colab notebook at https://github.com/automl/TabPFN.

ICLR Conference 2023 Conference Paper

Transfer NAS with Meta-learned Bayesian Surrogates

  • Gresa Shala
  • Thomas Elsken
  • Frank Hutter
  • Josif Grabocka

While neural architecture search (NAS) is an intensely-researched area, approaches typically still suffer from either (i) high computational costs or (ii) lack of robustness across datasets and experiments. Furthermore, most methods start searching for an optimal architecture from scratch, ignoring prior knowledge. This is in contrast to the manual design process by researchers and engineers that leverage previous deep learning experiences by, e.g., transferring architectures from previously solved, related problems. We propose to adopt this human design strategy and introduce a novel surrogate for NAS, that is meta-learned across prior architecture evaluations across different datasets. We utilizes Bayesian Optimization (BO) with deep-kernel Gaussian Processes, graph neural networks for the architecture embeddings and a transformer-based set encoder of datasets. As a result, our method consistently achieves state-of-the-art results on six computer vision datasets, while being as fast as one-shot NAS methods.

ICLR Conference 2022 Conference Paper

$\pi$BO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization

  • Carl Hvarfner
  • Danny Stoll
  • Artur L. F. Souza
  • Marius Lindauer
  • Frank Hutter
  • Luigi Nardi

Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) algorithms. While known for its sample-efficiency, vanilla BO can not utilize readily available prior beliefs the practitioner has on the potential location of the optimum. Thus, BO disregards a valuable source of information, reducing its appeal to ML practitioners. To address this issue, we propose $\pi$BO, an acquisition function generalization which incorporates prior beliefs about the location of the optimum in the form of a probability distribution, provided by the user. In contrast to previous approaches, $\pi$BO is conceptually simple and can easily be integrated with existing libraries and many acquisition functions. We provide regret bounds when $\pi$BO is applied to the common Expected Improvement acquisition function and prove convergence at regular rates independently of the prior. Further, our experiments show that $\pi$BO outperforms competing approaches across a wide suite of benchmarks and prior characteristics. We also demonstrate that $\pi$BO improves on the state-of-the-art performance for a popular deep learning task, with a $12.5\times$ time-to-accuracy speedup over prominent BO approaches.

JMLR Journal 2022 Journal Article

Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning

  • Matthias Feurer
  • Katharina Eggensperger
  • Stefan Falkner
  • Marius Lindauer
  • Frank Hutter

Automated Machine Learning (AutoML) supports practitioners and researchers with the tedious task of designing machine learning pipelines and has recently achieved substantial success. In this paper, we introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge. We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits by using a new, simple and meta-feature-free meta-learning technique and by employing a successful bandit strategy for budget allocation. However, PoSH Auto-sklearn introduces even more ways of running AutoML and might make it harder for users to set it up correctly. Therefore, we also go one step further and study the design space of AutoML itself, proposing a solution towards truly hands-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn 2.0. We verify the improvements by these additions in an extensive experimental study on 39 AutoML benchmark datasets. We conclude the paper by comparing to other popular AutoML frameworks and Auto-sklearn 1.0, reducing the relative error by up to a factor of 4.5, and yielding a performance in 10 minutes that is substantially better than what Auto-sklearn 1.0 achieves within an hour. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

JAIR Journal 2022 Journal Article

Automated Dynamic Algorithm Configuration

  • Steven Adriaensen
  • André Biedenkapp
  • Gresa Shala
  • Noor Awad
  • Theresa Eimer
  • Marius Lindauer
  • Frank Hutter

The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior art to tackle this problem; and (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning.

JAIR Journal 2022 Journal Article

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

  • Jack Parker-Holder
  • Raghu Rajan
  • Xingyou Song
  • André Biedenkapp
  • Yingjie Miao
  • Theresa Eimer
  • Baohe Zhang
  • Vu Nguyen

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems and also limits its full potential. In many other areas of machine learning, AutoML has shown that it is possible to automate such design choices, and AutoML has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games, such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey, we seek to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems of interest to researchers going forward.

NeurIPS Conference 2022 Conference Paper

JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search

  • Archit Bansal
  • Danny Stoll
  • Maciej Janowski
  • Arber Zela
  • Frank Hutter

The past few years have seen the development of many benchmarks for Neural Architecture Search (NAS), fueling rapid progress in NAS research. However, recent work, which shows that good hyperparameter settings can be more important than using the best architecture, calls for a shift in focus towards Joint Architecture and Hyperparameter Search (JAHS). Therefore, we present JAHS-Bench-201, the first collection of surrogate benchmarks for JAHS, built to also facilitate research on multi-objective, cost-aware and (multi) multi-fidelity optimization algorithms. To the best of our knowledge, JAHS-Bench-201 is based on the most extensive dataset of neural network performance data in the public domain. It is composed of approximately 161 million data points and 20 performance metrics for three deep learning tasks, while featuring a 14-dimensional search and fidelity space that extends the popular NAS-Bench-201 space. With JAHS-Bench-201, we hope to democratize research on JAHS and lower the barrier to entry of an extremely compute intensive field, e. g. , by reducing the compute time to run a JAHS algorithm from 5 days to only a few seconds.

NeurIPS Conference 2022 Conference Paper

Joint Entropy Search For Maximally-Informed Bayesian Optimization

  • Carl Hvarfner
  • Frank Hutter
  • Luigi Nardi

Information-theoretic Bayesian optimization techniques have become popular for optimizing expensive-to-evaluate black-box functions due to their non-myopic qualities. Entropy Search and Predictive Entropy Search both consider the entropy over the optimum in the input space, while the recent Max-value Entropy Search considers the entropy over the optimal value in the output space. We propose Joint Entropy Search (JES), a novel information-theoretic acquisition function that considers an entirely new quantity, namely the entropy over the joint optimal probability density over both input and output space. To incorporate this information, we consider the reduction in entropy from conditioning on fantasized optimal input/output pairs. The resulting approach primarily relies on standard GP machinery and removes complex approximations typically associated with information-theoretic methods. With minimal computational overhead, JES shows superior decision-making, and yields state-of-the-art performance for information-theoretic approaches across a wide suite of tasks. As a light-weight approach with superior results, JES provides a new go-to acquisition function for Bayesian optimization.

ICLR Conference 2022 Conference Paper

Learning Synthetic Environments and Reward Networks for Reinforcement Learning

  • Fabio Ferreira
  • Thomas Nierhoff
  • Andreas Sälinger
  • Frank Hutter

We introduce Synthetic Environments (SEs) and Reward Networks (RNs), represented by neural networks, as proxy environment models for training Reinforcement Learning (RL) agents. We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment. While an SE acts as a full proxy to a real environment by learning about its state dynamics and rewards, an RN is a partial proxy that learns to augment or replace rewards. We use bi-level optimization to evolve SEs and RNs: the inner loop trains the RL agent, and the outer loop trains the parameters of the SE / RN via an evolution strategy. We evaluate our proposed new concept on a broad range of RL algorithms and classic control environments. In a one-to-one comparison, learning an SE proxy requires more interactions with the real environment than training agents only on the real environment. However, once such an SE has been learned, we do not need any interactions with the real environment to train new agents. Moreover, the learned SE proxies allow us to train agents with fewer interactions while maintaining the original task performance. Our empirical results suggest that SEs achieve this result by learning informed representations that bias the agents towards relevant states. Moreover, we find that these proxies are robust against hyperparameter variation and can also transfer to unseen agents.

NeurIPS Conference 2022 Conference Paper

NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies

  • Arjun Krishnakumar
  • Colin White
  • Arber Zela
  • Renbo Tu
  • Mahmoud Safari
  • Frank Hutter

Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS). Recent work has shown that these techniques show great promise, but certain aspects, such as evaluating and exploiting their complementary strengths, are under-studied. In this work, we create NAS-Bench-Suite: we evaluate 13 ZC proxies across 28 tasks, creating by far the largest dataset (and unified codebase) for ZC proxies, enabling orders-of-magnitude faster experiments on ZC proxies, while avoiding confounding factors stemming from different implementations. To demonstrate the usefulness of NAS-Bench-Suite, we run a large-scale analysis of ZC proxies, including a bias analysis, and the first information-theoretic analysis which concludes that ZC proxies capture substantial complementary information. Motivated by these findings, we present a procedure to improve the performance of ZC proxies by reducing biases such as cell size, and we also show that incorporating all 13 ZC proxies into the surrogate models used by NAS algorithms can improve their predictive performance by up to 42%. Our code and datasets are available at https: //github. com/automl/naslib/tree/zerocost.

ICLR Conference 2022 Conference Paper

NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy

  • Yash Mehta
  • Colin White
  • Arber Zela
  • Arjun Krishnakumar
  • Guri Zabergja
  • Shakiba Moradian
  • Mahmoud Safari
  • Kaicheng Yu

The release of tabular benchmarks, such as NAS-Bench-101 and NAS-Bench-201, has significantly lowered the computational overhead for conducting scientific research in neural architecture search (NAS). Although they have been widely adopted and used to tune real-world NAS algorithms, these benchmarks are limited to small search spaces and focus solely on image classification. Recently, several new NAS benchmarks have been introduced that cover significantly larger search spaces over a wide range of tasks, including object detection, speech recognition, and natural language processing. However, substantial differences among these NAS benchmarks have so far prevented their widespread adoption, limiting researchers to using just a few benchmarks. In this work, we present an in-depth analysis of popular NAS algorithms and performance prediction methods across 25 different combinations of search spaces and datasets, finding that many conclusions drawn from a few NAS benchmarks do \emph{not} generalize to other benchmarks. To help remedy this problem, we introduce \nasbs, a comprehensive and extensible collection of NAS benchmarks, accessible through a unified interface, created with the aim to facilitate reproducible, generalizable, and rapid NAS research. Our code is available at https://github.com/automl/naslib.

NeurIPS Conference 2022 Conference Paper

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

  • Jörg Franke
  • Frederic Runge
  • Frank Hutter

Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.

JMLR Journal 2022 Journal Article

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization

  • Marius Lindauer
  • Katharina Eggensperger
  • Matthias Feurer
  • André Biedenkapp
  • Difan Deng
  • Carolin Benjamins
  • Tim Ruhkopf
  • René Sass

Algorithm parameters, in particular hyperparameters of machine learning algorithms, can substantially impact their performance. To support users in determining well-performing hyperparameter configurations for their algorithms, datasets and applications at hand, SMAC3 offers a robust and flexible framework for Bayesian Optimization, which can improve performance within a few evaluations. It offers several facades and pre-sets for typical use cases, such as optimizing hyperparameters, solving low dimensional continuous (artificial) global optimization problems and configuring algorithms to perform well across multiple problem instances. The SMAC3 package is available under a permissive BSD-license at https://github.com/automl/SMAC3. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

ICLR Conference 2022 Conference Paper

Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks

  • Arber Zela
  • Julien Siems
  • Lucas Zimmer
  • Jovita Lukasik
  • Margret Keuper
  • Frank Hutter

The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tabular NAS benchmarks has been a focus on extremely small architectural search spaces since their construction relies on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger spaces. To overcome this fundamental limitation, we propose a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. We exemplify this approach by creating surrogate NAS benchmarks on the existing tabular NAS-Bench-101 and on two widely used NAS search spaces with up to $10^{21}$ architectures ($10^{13}$ times larger than any previous tabular NAS benchmark). We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight. We open-source all our code and believe that surrogate NAS benchmarks are an indispensable tool to extend scientifically sound work on NAS to large and exciting search spaces.

IROS Conference 2022 Conference Paper

T3VIP: Transformation-based 3D Video Prediction

  • Iman Nematollahi
  • Erick Rosete-Beas
  • Seyed Mahdi B. Azad
  • Raghu Rajan
  • Frank Hutter
  • Wolfram Burgard

For autonomous skill acquisition, robots have to learn about the physical rules governing the 3D world dynamics from their own past experience to predict and reason about plausible future outcomes. To this end, we propose a transformation-based 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts and predicting their corresponding rigid transformations. Our model is fully unsupervised, captures the stochastic nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals. To fully leverage all the 2D and 3D observational signals, we equip our model with automatic hyperparameter optimization (HPO) to interpret the best way of learning from them. To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera. Our extensive evaluation with simulated and real-world datasets demonstrates that our formulation leads to interpretable 3D models that predict future depth videos while achieving on-par performance with 2D models on RGB video prediction. Moreover, we demonstrate that our model outperforms 2D baselines on visuomotor control. Videos, code, dataset, and pre-trained models are available at http://t3vip.cs.uni-freiburg.de.

ICLR Conference 2022 Conference Paper

Transformers Can Do Bayesian Inference

  • Samuel Müller 0005
  • Noah Hollmann
  • Sebastian Pineda-Arango
  • Josif Grabocka
  • Frank Hutter

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

ICML Conference 2022 Conference Paper

Zero-shot AutoML with Pretrained Models

  • Ekrem Öztürk
  • Fabio Ferreira
  • Hadi S. Jomaa
  • Lars Schmidt-Thieme
  • Josif Grabocka
  • Frank Hutter

Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to select the right deep learning (DL) pipeline (including the pre-trained model and fine-tuning hyperparameters) for a new dataset D given only trivial meta-features describing D such as image resolution or the number of classes. To train this zero-shot model, we collect performance data for many DL pipelines on a large collection of datasets and meta-train on this data to minimize a pairwise ranking objective. We evaluate our approach under the strict time limit of the vision track of the ChaLearn AutoDL challenge benchmark, clearly outperforming all challenge contenders.

IJCAI Conference 2021 Conference Paper

DACBench: A Benchmark Library for Dynamic Algorithm Configuration

  • Theresa Eimer
  • André Biedenkapp
  • Maximilian Reimer
  • Steven Adriansen
  • Frank Hutter
  • Marius Lindauer

Dynamic Algorithm Configuration (DAC) aims to dynamically control a target algorithm's hyperparameters in order to improve its performance. Several theoretical and empirical results have demonstrated the benefits of dynamically controlling hyperparameters in domains like evolutionary computation, AI Planning or deep learning. Replicating these results, as well as studying new methods for DAC, however, is difficult since existing benchmarks are often specialized and incompatible with the same interfaces. To facilitate benchmarking and thus research on DAC, we propose DACBench, a benchmark library that seeks to collect and standardize existing DAC benchmarks from different AI domains, as well as provide a template for new ones. For the design of DACBench, we focused on important desiderata, such as (i) flexibility, (ii) reproducibility, (iii) extensibility and (iv) automatic documentation and visualization. To show the potential, broad applicability and challenges of DAC, we explore how a set of six initial benchmarks compare in several dimensions of difficulty.

IJCAI Conference 2021 Conference Paper

DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization

  • Noor Awad
  • Neeratyoy Mallik
  • Frank Hutter

Modern machine learning algorithms crucially rely on several design decisions to achieve strong performance, making the problem of Hyperparameter Optimization (HPO) more important than ever. Here, we combine the advantages of the popular bandit-based HPO method Hyperband (HB) and the evolutionary search approach of Differential Evolution (DE) to yield a new HPO method which we call DEHB. Comprehensive results on a very broad range of HPO problems, as well as a wide range of tabular benchmarks from neural architecture search, demonstrate that DEHB achieves strong performance far more robustly than all previous HPO methods we are aware of, especially for high-dimensional problems with discrete input dimensions. For example, DEHB is up to 1000x faster than random search. It is also efficient in computational time, conceptually simple and easy to implement, positioning it well to become a new default HPO method.

NeurIPS Conference 2021 Conference Paper

How Powerful are Performance Predictors in Neural Architecture Search?

  • Colin White
  • Arber Zela
  • Robin Ru
  • Yang Liu
  • Frank Hutter

Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks. To reduce this extreme computational cost, dozens of techniques have since been proposed to predict the final performance of neural architectures. Despite the success of such performance prediction methods, it is not well-understood how different families of techniques compare to one another, due to the lack of an agreed-upon evaluation metric and optimization for different constraints on the initialization time and query time. In this work, we give the first large-scale study of performance predictors by analyzing 31 techniques ranging from learning curve extrapolation, to weight-sharing, to supervised learning, to zero-cost proxies. We test a number of correlation- and rank-based performance measures in a variety of settings, as well as the ability of each technique to speed up predictor-based NAS frameworks. Our results act as recommendations for the best predictors to use in different settings, and we show that certain families of predictors can be combined to achieve even better predictive power, opening up promising research directions. We release our code, featuring a library of 31 performance predictors.

NeurIPS Conference 2021 Conference Paper

HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO

  • Katharina Eggensperger
  • Philipp Müller
  • Neeratyoy Mallik
  • Matthias Feurer
  • Rene Sass
  • Aaron Klein
  • Noor Awad
  • Marius Lindauer

To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years, the number of efficient algorithms and tools for HPO grew substantially. At the same time, the community is still lacking realistic, diverse, computationally cheap, and standardized benchmarks. This is especially the case for multi-fidelity HPO methods. To close this gap, we propose HPOBench, which includes 7 existing and 5 new benchmark families, with a total of more than 100 multi-fidelity benchmark problems. HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers. It also provides surrogate and tabular benchmarks for computationally affordable yet statistically sound evaluations. To demonstrate HPOBench’s broad compatibility with various optimization tools, as well as its usefulness, we conduct an exemplary large-scale study evaluating 13 optimizers from 6 optimization tools. We provide HPOBench here: https: //github. com/automl/HPOBench.

ICAPS Conference 2021 Conference Paper

Learning Heuristic Selection with Dynamic Algorithm Configuration

  • David Speck 0001
  • André Biedenkapp
  • Frank Hutter
  • Robert Mattmüller
  • Marius Lindauer

A key challenge in satisficing planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most useful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches.

NeurIPS Conference 2021 Conference Paper

NAS-Bench-x11 and the Power of Learning Curves

  • Shen Yan
  • Colin White
  • Yash Savani
  • Frank Hutter

While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research. However, two of the most popular benchmarks do not provide the full training information for each architecture. As a result, on these benchmarks it is not possible to evaluate many types of multi-fidelity algorithms, such as learning curve extrapolation, that require evaluating architectures at arbitrary epochs. In this work, we present a method using singular value decomposition and noise modeling to create surrogate benchmarks, NAS-Bench-111, NAS-Bench-311, and NAS-Bench-NLP11, that output the full training information for each architecture, rather than just the final validation accuracy. We demonstrate the power of using the full training information by introducing a learning curve extrapolation framework to modify single-fidelity algorithms, showing that it leads to improvements over popular single-fidelity algorithms which claimed to be state-of-the-art upon release.

NeurIPS Conference 2021 Conference Paper

Neural Ensemble Search for Uncertainty Estimation and Dataset Shift

  • Sheheryar Zaidi
  • Arber Zela
  • Thomas Elsken
  • Chris C Holmes
  • Frank Hutter
  • Yee Teh

Ensembles of neural networks achieve superior performance compared to standalone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift. Deep ensembles, a state-of-the-art method for uncertainty estimation, only ensemble random initializations of a fixed architecture. Instead, we propose two methods for automatically constructing ensembles with varying architectures, which implicitly trade-off individual architectures’ strengths against the ensemble’s diversity and exploit architectural variation as a source of diversity. On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift. Our further analysis and ablation studies provide evidence of higher ensemble diversity due to architectural variation, resulting in ensembles that can outperform deep ensembles, even when having weaker average base learners. To foster reproducibility, our code is available: https: //github. com/automl/nes

NeurIPS Conference 2021 Conference Paper

OpenML Benchmarking Suites

  • Bernd Bischl
  • Giuseppe Casalicchio
  • Matthias Feurer
  • Pieter Gijsbers
  • Frank Hutter
  • Michel Lang
  • Rafael Gomes Mantovani
  • Jan van Rijn

Machine learning research depends on objectively interpretable, comparable, and reproducible algorithm benchmarks. We advocate the use of curated, comprehensive suites of machine learning tasks to standardize the setup, execution, and reporting of benchmarks. We enable this through software tools that help to create and leverage these benchmarking suites. These are seamlessly integrated into the OpenML platform, and accessible through interfaces in Python, Java, and R. OpenML benchmarking suites (a) are easy to use through standardized data formats, APIs, and client libraries; (b) come with extensive meta-information on the included datasets; and (c) allow benchmarks to be shared and reused in future studies. We then present a first, carefully curated and practical benchmarking suite for classification: the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18). Finally, we discuss use cases and applications which demonstrate the usefulness of OpenML benchmarking suites and the OpenML-CC18 in particular.

JMLR Journal 2021 Journal Article

OpenML-Python: an extensible Python API for OpenML

  • Matthias Feurer
  • Jan N. van Rijn
  • Arlind Kadra
  • Pieter Gijsbers
  • Neeratyoy Mallik
  • Sahithya Ravi
  • Andreas Müller
  • Joaquin Vanschoren

OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper, we introduce OpenML-Python, a client API for Python, which opens up the OpenML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation are available at https://github.com/openml/openml-python/. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICLR Conference 2021 Conference Paper

Sample-Efficient Automated Deep Reinforcement Learning

  • Jörg K. H. Franke
  • Gregor Köhler
  • André Biedenkapp
  • Frank Hutter

Despite significant progress in challenging problems across various domains, applying state-of-the-art deep reinforcement learning (RL) algorithms remains challenging due to their sensitivity to the choice of hyperparameters. This sensitivity can partly be attributed to the non-stationarity of the RL problem, potentially requiring different hyperparameter settings at various stages of the learning process. Additionally, in the RL setting, hyperparameter optimization (HPO) requires a large number of environment interactions, hindering the transfer of the successes in RL to real-world applications. In this work, we tackle the issues of sample-efficient and dynamic HPO in RL. We propose a population-based automated RL (AutoRL) framework to meta-optimize arbitrary off-policy RL algorithms. In this framework, we optimize the hyperparameters and also the neural architecture while simultaneously training the agent. By sharing the collected experience across the population, we substantially increase the sample efficiency of the meta-optimization. We demonstrate the capabilities of our sample-efficient AutoRL approach in a case study with the popular TD3 algorithm in the MuJoCo benchmark suite, where we reduce the number of environment interactions needed for meta-optimization by up to an order of magnitude compared to population-based training.

ICML Conference 2021 Conference Paper

Self-Paced Context Evaluation for Contextual Reinforcement Learning

  • Theresa Eimer
  • André Biedenkapp
  • Frank Hutter
  • Marius Lindauer

Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPaCE). Based on self-paced learning, SPaCE automatically generates instance curricula online with little computational overhead. To this end, SPaCE leverages information contained in state values during training to accelerate and improve training performance as well as generalization capabilities to new \tasks from the same problem domain. Nevertheless, SPaCE is independent of the problem domain at hand and can be applied on top of any RL agent with state-value function approximation. We demonstrate SPaCE’s ability to speed up learning of different value-based RL agents on two environments, showing better generalization capabilities and up to 10x faster learning compared to naive approaches such as round robin or SPDRL, as the closest state-of-the-art approach.

ICML Conference 2021 Conference Paper

TempoRL: Learning When to Act

  • André Biedenkapp
  • Raghu Rajan
  • Frank Hutter
  • Marius Lindauer

Reinforcement learning is a powerful approach to learn behaviour through interactions with an environment. However, behaviours are usually learned in a purely reactive fashion, where an appropriate action is selected based on an observation. In this form, it is challenging to learn when it is necessary to execute new decisions. This makes learning inefficient especially in environments that need various degrees of fine and coarse control. To address this, we propose a proactive setting in which the agent not only selects an action in a state but also for how long to commit to that action. Our TempoRL approach introduces skip connections between states and learns a skip-policy for repeating the same action along these skips. We demonstrate the effectiveness of TempoRL on a variety of traditional and deep RL environments, showing that our approach is capable of learning successful policies up to an order of magnitude faster than vanilla Q-learning.

NeurIPS Conference 2021 Conference Paper

Well-tuned Simple Nets Excel on Tabular Datasets

  • Arlind Kadra
  • Marius Lindauer
  • Frank Hutter
  • Josif Grabocka

Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularization techniques. As a result, we propose regularizing plain Multilayer Perceptron (MLP) networks by searching for the optimal combination/cocktail of 13 regularization techniques for each dataset using a joint optimization over the decision on which regularizers to apply and their subsidiary hyperparameters. We empirically assess the impact of these regularization cocktails for MLPs in a large-scale empirical study comprising 40 tabular datasets and demonstrate that (i) well-regularized plain MLPs significantly outperform recent state-of-the-art specialized neural network architectures, and (ii) they even outperform strong traditional ML methods, such as XGBoost.

JMLR Journal 2020 Journal Article

Best Practices for Scientific Research on Neural Architecture Search

  • Marius Lindauer
  • Frank Hutter

Finding a well-performing architecture is often tedious for both deep learning practitioners and researchers, leading to tremendous interest in the automation of this task by means of neural architecture search (NAS). Although the community has made major strides in developing better NAS methods, the quality of scientific empirical evaluations in the young field of NAS is still lacking behind that of other areas of machine learning. To address this issue, we describe a set of possible issues and ways to avoid them, leading to the NAS best practices checklist available at http://automl.org/nas_checklist.pdf. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

ECAI Conference 2020 Conference Paper

Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework

  • André Biedenkapp
  • H. Furkan Bozkurt
  • Theresa Eimer
  • Frank Hutter
  • Marius Lindauer

The performance of many algorithms in the fields of hard combinatorial problem solving, machine learning or AI in general depends on parameter tuning. Automated methods have been proposed to alleviate users from the tedious and error-prone task of manually searching for performance-optimized configurations across a set of problem instances. However, there is still a lot of untapped potential through adjusting an algorithm’s parameters online since different parameter values can be optimal at different stages of the algorithm. Prior work showed that reinforcement learning is an effective approach to learn policies for online adjustments of algorithm parameters in a data-driven way. We extend that approach by formulating the resulting dynamic algorithm configuration as a contextual MDP, such that RL not only learns a policy for a single instance, but across a set of instances. To lay the foundation for studying dynamic algorithm configuration with RL in a controlled setting, we propose white-box benchmarks covering major aspects that make dynamic algorithm configuration a hard problem in practice and study the performance of various types of configuration strategies for them. On these white-box benchmarks, we show that (i) RL is a robust candidate for learning configuration policies, outperforming standard parameter optimization approaches, such as classical algorithm configuration; (ii) based on function approximation, RL agents can learn to generalize to new types of instances; and (iii) self-paced learning can substantially improve the performance by selecting a useful sequence of training instances automatically.

PRL Workshop 2020 Workshop Paper

Learning Heuristic Selection with Dynamic Algorithm Configuration

  • David Speck
  • André Biedenkapp
  • Frank Hutter
  • Robert Mattmüller
  • Marius Lindauer

A key challenge in satisficing planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most helpful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches in terms of coverage.

ICLR Conference 2020 Conference Paper

Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization

  • Michael Volpp
  • Lukas P. Fröhlich
  • Kirsten Fischer
  • Andreas Doerr
  • Stefan Falkner
  • Frank Hutter
  • Christian G. Daniel

Transferring knowledge across tasks to improve data-efficiency is one of the open key challenges in the field of global black-box optimization. Readily available algorithms are typically designed to be universal optimizers and, therefore, often suboptimal for specific tasks. We propose a novel transfer learning method to obtain customized optimizers within the well-established framework of Bayesian optimization, allowing our algorithm to utilize the proven generalization capabilities of Gaussian processes. Using reinforcement learning to meta-train an acquisition function (AF) on a set of related tasks, the proposed method learns to extract implicit structural information and to exploit it for improved data-efficiency. We present experiments on a simulation-to-real transfer task as well as on several synthetic functions and on two hyperparameter search problems. The results show that our algorithm (1) automatically identifies structural properties of objective functions from available source tasks or simulations, (2) performs favourably in settings with both scarse and abundant source data, and (3) falls back to the performance level of general AFs if no particular structure is present.

ICLR Conference 2020 Conference Paper

NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search

  • Arber Zela
  • Julien Siems
  • Frank Hutter

One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-shot NAS that can be instantiated to many recently-introduced variants and introduce a general benchmarking framework that draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods. To showcase the framework, we compare several state-of-the-art one-shot NAS methods, examine how sensitive they are to their hyperparameters and how they can be improved by tuning their hyperparameters, and compare their performance to that of blackbox optimizers for NAS-Bench-101.

ICLR Conference 2020 Conference Paper

Transferring Optimality Across Data Distributions via Homotopy Methods

  • Matilde Gargiani
  • Andrea Zanelli
  • Quoc Tran-Dinh
  • Moritz Diehl
  • Frank Hutter

Homotopy methods, also known as continuation methods, are a powerful mathematical tool to efficiently solve various problems in numerical analysis, including complex non-convex optimization problems where no or only little prior knowledge regarding the localization of the solutions is available. In this work, we propose a novel homotopy-based numerical method that can be used to transfer knowledge regarding the localization of an optimum across different task distributions in deep learning applications. We validate the proposed methodology with some empirical evaluations in the regression and classification scenarios, where it shows that superior numerical performance can be achieved in popular deep learning benchmarks, i.e. FashionMNIST, CIFAR-10, and draw connections with the widely used fine-tuning heuristic. In addition, we give more insights on the properties of a general homotopy method when used in combination with Stochastic Gradient Descent by conducting a general local theoretical analysis in a simplified setting.

ICLR Conference 2020 Conference Paper

Understanding and Robustifying Differentiable Architecture Search

  • Arber Zela
  • Thomas Elsken
  • Tonmoy Saikia
  • Yassine Marrakchi
  • Thomas Brox
  • Frank Hutter

Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem. However, DARTS does not work robustly for new problems: we identify a wide range of search spaces for which DARTS yields degenerate architectures with very poor test performance. We study this failure mode and show that, while DARTS successfully minimizes validation loss, the found solutions generalize poorly when they coincide with high validation loss curvature in the architecture space. We show that by adding one of various types of regularization we can robustify DARTS to find solutions with less curvature and better generalization properties. Based on these observations, we propose several simple variations of DARTS that perform substantially more robustly in practice. Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling.

IJCAI Conference 2019 Conference Paper

An Evolution Strategy with Progressive Episode Lengths for Playing Games

  • Lior Fuks
  • Noor Awad
  • Frank Hutter
  • Marius Lindauer

Recently, Evolution Strategies (ES) have been successfully applied to solve problems commonly addressed by reinforcement learning (RL). Due to the simplicity of ES approaches, their runtime is often dominated by the RL-task at hand (e. g. , playing a game). In this work, we introduce Progressive Episode Lengths (PEL) as a new technique and incorporate it with ES. The main objective is to allow the agent to play short and easy tasks with limited lengths, and then use the gained knowledge to further solve long and hard tasks with progressive lengths. Hence allowing the agent to perform many function evaluations and find a good solution for short time horizons before adapting the strategy to tackle larger time horizons. We evaluated PEL on a subset of Atari games from OpenAI Gym, showing that it can substantially improve the optimization speed, stability and final score of canonical ES. Specifically, we show average improvements of 80% (32%) after 2 hours (10 hours) compared to canonical ES.

NeurIPS Conference 2019 Conference Paper

Meta-Surrogate Benchmarking for Hyperparameter Optimization

  • Aaron Klein
  • Zhenwen Dai
  • Frank Hutter
  • Neil Lawrence
  • Javier Gonzalez

Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practitioners no only from systematically running large-scale comparisons that are needed to draw statistically significant results but also from reproducing experiments that were conducted before. This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.

ICML Conference 2019 Conference Paper

NAS-Bench-101: Towards Reproducible Neural Architecture Search

  • Chris Ying
  • Aaron Klein
  • Eric Christiansen
  • Esteban Real
  • Kevin Murphy 0002
  • Frank Hutter

Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the pre-computed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms.

JMLR Journal 2019 Journal Article

Neural Architecture Search: A Survey

  • Thomas Elsken
  • Jan Hendrik Metzen
  • Frank Hutter

Deep Learning has enabled remarkable progress over the last years on a variety of tasks, such as image recognition, speech recognition, and machine translation. One crucial aspect for this progress are novel neural architectures. Currently employed architectures have mostly been developed manually by human experts, which is a time-consuming and error-prone process. Because of this, there is growing interest in automated \emph{neural architecture search} methods. We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

JAIR Journal 2019 Journal Article

Pitfalls and Best Practices in Algorithm Configuration

  • Katharina Eggensperger
  • Marius Lindauer
  • Frank Hutter

Good parameter settings are crucial to achieve high performance in many areas of artificial intelligence (AI), such as propositional satisfiability solving, AI planning, scheduling, and machine learning (in particular deep learning). Automated algorithm configuration methods have recently received much attention in the AI community since they replace tedious, irreproducible and error-prone manual parameter tuning and can lead to new state-of-the-art performance. However, practical applications of algorithm configuration are prone to several (often subtle) pitfalls in the experimental design that can render the procedure ineffective. We identify several common issues and propose best practices for avoiding them. As one possibility for automatically handling as many of these as possible, we also propose a tool called GenericWrapper4AC.

IJCAI Conference 2018 Conference Paper

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

  • Patryk Chrabąszcz
  • Ilya Loshchilov
  • Frank Hutter

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades. We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.

ICML Conference 2018 Conference Paper

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

  • Stefan Falkner
  • Aaron Klein
  • Frank Hutter

Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and bandit-based methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.

NeurIPS Conference 2018 Conference Paper

Maximizing acquisition functions for Bayesian optimization

  • James Wilson
  • Frank Hutter
  • Marc Deisenroth

Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when evaluating queries in parallel, where acquisition functions are routinely non-convex, high-dimensional, and intractable. We first show that acquisition functions estimated via Monte Carlo integration are consistently amenable to gradient-based optimization. Subsequently, we identify a common family of acquisition functions, including EI and UCB, whose characteristics not only facilitate but justify use of greedy approaches for their maximization.

IJCAI Conference 2018 Conference Paper

Neural Networks for Predicting Algorithm Runtime Distributions

  • Katharina Eggensperger
  • Marius Lindauer
  • Frank Hutter

Many state-of-the-art algorithms for solving hard combinatorial problems in artificial intelligence (AI) include elements of stochasticity that lead to high variations in runtime, even for a fixed problem instance. Knowledge about the resulting runtime distributions (RTDs) of algorithms on given problem instances can be exploited in various meta-algorithmic procedures, such as algorithm selection, portfolios, and randomized restarts. Previous work has shown that machine learning can be used to individually predict mean, median and variance of RTDs. To establish a new state-of-the-art in predicting RTDs, we demonstrate that the parameters of an RTD should be learned jointly and that neural networks can do this well by directly optimizing the likelihood of an RTD given runtime observations. In an empirical study involving five algorithms for SAT solving and AI planning, we show that neural networks predict the true RTDs of unseen instances better than previous methods, and can even do so when only few runtime observations are available per training instance.

AAAI Conference 2018 Conference Paper

Warmstarting of Model-Based Algorithm Configuration

  • Marius Lindauer
  • Frank Hutter

The performance of many hard combinatorial problem solvers depends strongly on their parameter settings, and since manual parameter tuning is both tedious and suboptimal the AI community has recently developed several algorithm configuration (AC) methods to automatically address this problem. While all existing AC methods start the configuration process of an algorithm A from scratch for each new type of benchmark instances, here we propose to exploit information about A’s performance on previous benchmarks in order to warmstart its configuration on new types of benchmarks. We introduce two complementary ways in which we can exploit this information to warmstart AC methods based on a predictive model. Experiments for optimizing a flexible modern SAT solver on twelve different instance sets show that our methods often yield substantial speedups over existing AC methods (up to 165-fold) and can also find substantially better configurations given the same compute budget.

JMLR Journal 2017 Journal Article

Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA

  • Lars Kotthoff
  • Chris Thornton
  • Holger H. Hoos
  • Frank Hutter
  • Kevin Leyton-Brown

WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often find it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA, a system designed to help such users by automatically searching through the joint space of WEKA's learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method. Our new package is tightly integrated with WEKA, making it just as accessible to end users as any other learning algorithm. [abs] [ pdf ][ bib ] [ code ] [ webpage ] &copy JMLR 2017. ( edit, beta )

IJCAI Conference 2017 Conference Paper

AutoFolio: An Automatically Configured Algorithm Selector (Extended Abstract)

  • Marius Lindauer
  • Frank Hutter
  • Holger H. Hoos
  • Torsten Schaub

Algorithm selection (AS) techniques -- which involve choosing from a set of algorithms the one expected to solve a given problem instance most efficiently -- have substantially improved the state of the art in solving many prominent AI problems, such as SAT, CSP, ASP, MAXSAT and QBF. Although several AS procedures have been introduced, not too surprisingly, none of them dominates all others across all AS scenarios. Furthermore, these procedures have parameters whose optimal values vary across AS scenarios. In this extended abstract of our 2015 JAIR article of the same title, we summarize AutoFolio, which uses an algorithm configuration procedure to automatically select an AS approach and optimize its parameters for a given AS scenario. AutoFolio allows researchers and practitioners across a broad range of applications to exploit the combined power of many different AS methods and to automatically construct high-performance algorithm selectors. We demonstrate that AutoFolio was able to produce new state-of-the-art algorithm selectors for 7 well-studied AS scenarios and matches state-of-the-art performance statistically on all other scenarios. Compared to the best single algorithm for each AS scenario, AutoFolio achieved average speedup factors between 1. 3 and 15. 4.

AAAI Conference 2017 Conference Paper

Efficient Parameter Importance Analysis via Ablation with Surrogates

  • Andre Biedenkapp
  • Marius Lindauer
  • Katharina Eggensperger
  • Frank Hutter
  • Chris Fawcett
  • Holger Hoos

To achieve peak performance, it is often necessary to adjust the parameters of a given algorithm to the class of problem instances to be solved; this is known to be the case for popular solvers for a broad range of AI problems, including AI planning, propositional satisfiability (SAT) and answer set programming (ASP). To avoid tedious and often highly sub-optimal manual tuning of such parameters by means of ad-hoc methods, general-purpose algorithm configuration procedures can be used to automatically find performance-optimizing parameter settings. While impressive performance gains are often achieved in this manner, additional, potentially costly parameter importance analysis is required to gain insights into what parameter changes are most responsible for those improvements. Here, we show how the running time cost of ablation analysis, a wellknown general-purpose approach for assessing parameter importance, can be reduced substantially by using regression models of algorithm performance constructed from data collected during the configuration process. In our experiments, we demonstrate speed-up factors between 33 and 14 727 for ablation analysis on various configuration scenarios from AI planning, SAT, ASP and mixed integer programming (MIP).

ICRA Conference 2016 Conference Paper

Automatic bone parameter estimation for skeleton tracking in optical motion capture

  • Tobias Schubert 0002
  • Katharina Eggensperger
  • Alexis Gkogkidis
  • Frank Hutter
  • Tonio Ball
  • Wolfram Burgard

Motion analysis is important in a broad range of contexts, including animation, bio-mechanics, robotics and experiments investigating animal behavior. For applications, in which tracking accuracy is one of the main requirements, passive optical motion capture systems are widely used. Many skeleton tracking methods based on such systems use a predefined skeleton model, which is scaled once in the initialization step to the individual size of the character to be tracked. However, there are remarkable differences in the bone length relations across gender and even more across mammal races. In practice, the optimal skeleton model has to be determined in a manual and time-consuming process. In this paper, we reformulate this task as an optimization problem aiming to rescale a rough hierarchical skeleton structure to optimize probabilistic skeleton tracking performance. We solve this optimization problem by means of state-of-the-art blackbox optimization methods based on sequential model-based Bayesian optimization (SMBO). We compare different SMBO methods on three real-world datasets with an animal and humans, demonstrating that we can automatically find skeleton structures for previously unseen mammals. The same methods also allow an automated choice of a suitable starting frame for initializing tracking.

JAIR Journal 2016 Journal Article

Bayesian Optimization in a Billion Dimensions via Random Embeddings

  • Ziyu Wang
  • Frank Hutter
  • Masrour Zoghi
  • David Matheson
  • Nando de Feitas

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can effectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.

NeurIPS Conference 2016 Conference Paper

Bayesian Optimization with Robust Bayesian Neural Networks

  • Jost Tobias Springenberg
  • Aaron Klein
  • Stefan Falkner
  • Frank Hutter

Bayesian optimization is a prominent method for optimizing expensive to evaluate black-box functions that is prominently applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.

IJCAI Conference 2015 Conference Paper

Algorithm Runtime Prediction: Methods and Evaluation (Extended Abstract)

  • Frank Hutter
  • Lin Xu
  • Holger Hoos
  • Kevin Leyton-Brown

Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm’s runtime as a function of problem-specific instance features. Such models have many important applications and over the past decade, a wide variety of techniques have been studied for building such models. In this extended abstract of our 2014 AI Journal article of the same title, we summarize existing models and describe new model families and various extensions. In a comprehensive empirical analyis using 11 algorithms and 35 instance distributions spanning a wide range of hard combinatorial problems, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.

JAIR Journal 2015 Journal Article

AutoFolio: An Automatically Configured Algorithm Selector

  • Marius Lindauer
  • Holger H. Hoos
  • Frank Hutter
  • Torsten Schaub

Algorithm selection (AS) techniques -- which involve choosing from a set of algorithms the one expected to solve a given problem instance most efficiently -- have substantially improved the state of the art in solving many prominent AI problems, such as SAT, CSP, ASP, MAXSAT and QBF. Although several AS procedures have been introduced, not too surprisingly, none of them dominates all others across all AS scenarios. Furthermore, these procedures have parameters whose optimal values vary across AS scenarios. This holds specifically for the machine learning techniques that form the core of current AS procedures, and for their hyperparameters. Therefore, to successfully apply AS to new problems, algorithms and benchmark sets, two questions need to be answered: (i) how to select an AS approach and (ii) how to set its parameters effectively. We address both of these problems simultaneously by using automated algorithm configuration. Specifically, we demonstrate that we can automatically configure claspfolio 2, which implements a large variety of different AS approaches and their respective parameters in a single, highly-parameterized algorithm framework. Our approach, dubbed AutoFolio, allows researchers and practitioners across a broad range of applications to exploit the combined power of many different AS methods. We demonstrate AutoFolio can significantly improve the performance of claspfolio 2 on 8 out of the 13 scenarios from the Algorithm Selection Library, leads to new state-of-the-art algorithm selectors for 7 of these scenarios, and matches state-of-the-art performance (statistically) on all other scenarios. Compared to the best single algorithm for each AS scenario, AutoFolio achieves average speedup factors between 1.3 and 15.4.

AAAI Conference 2015 Conference Paper

Automatic Configuration of Sequential Planning Portfolios

  • Jendrik Seipp
  • Silvan Sievers
  • Malte Helmert
  • Frank Hutter

Sequential planning portfolios exploit the complementary strengths of different planners. Similarly, automated algorithm configuration tools can customize parameterized planning algorithms for a given type of tasks. Although some work has been done towards combining portfolios and algorithm configuration, the problem of automatically generating a sequential planning portfolio from a parameterized planner for a given type of tasks is still largely unsolved. Here, we present Cedalion, a conceptually simple approach for this problem that greedily searches for the hparameter configuration, runtimei pair which, when appended to the current portfolio, maximizes portfolio improvement per additional runtime spent. We show theoretically that Cedalion yields portfolios provably within a constant factor of optimal for the training set distribution. We evaluate Cedalion empirically by applying it to construct sequential planning portfolios based on component planners from the highly parameterized Fast Downward (FD) framework. Results for a broad range of planning settings demonstrate that – without any knowledge of planning or FD – Cedalion constructs sequential FD portfolios that rival, and in some cases substantially outperform, manually-built FD portfolios.

NeurIPS Conference 2015 Conference Paper

Efficient and Robust Automated Machine Learning

  • Matthias Feurer
  • Aaron Klein
  • Katharina Eggensperger
  • Jost Springenberg
  • Manuel Blum
  • Frank Hutter

The success of machine learning in a broad range of applications has led to an ever-growing demand for machine learning systems that can be used off the shelf by non-experts. To be effective in practice, such systems need to automatically choose a good algorithm and feature preprocessing steps for a new dataset at hand, and also set their respective hyperparameters. Recent work has started to tackle this automated machine learning (AutoML) problem with the help of efficient Bayesian optimization methods. In this work we introduce a robust new AutoML system based on scikit-learn (using 15 classifiers, 14 feature preprocessing methods, and 4 data preprocessing methods, giving rise to a structured hypothesis space with 110 hyperparameters). This system, which we dub auto-sklearn, improves on existing AutoML methods by automatically taking into account past performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization. Our system won the first phase of the ongoing ChaLearn AutoML challenge, and our comprehensive analysis on over 100 diverse datasets shows that it substantially outperforms the previous state of the art in AutoML. We also demonstrate the performance gains due to each of our contributions and derive insights into the effectiveness of the individual components of auto-sklearn.

AAAI Conference 2015 Conference Paper

Efficient Benchmarking of Hyperparameter Optimizers via Surrogates

  • Katharina Eggensperger
  • Frank Hutter
  • Holger Hoos
  • Kevin Leyton-Brown

Hyperparameter optimization is crucial for achieving peak performance with many machine learning algorithms; however, the evaluation of new optimization techniques on real-world hyperparameter optimization problems can be very expensive. Therefore, experiments are often performed using cheap synthetic test functions with characteristics rather different from those of real benchmarks of interest. In this work, we introduce another option: cheap-to-evaluate surrogates of real hyperparameter optimization benchmarks that share the same hyperparameter spaces and feature similar response surfaces. Specifically, we train regression models on data describing a machine learning algorithm’s performance depending on its hyperparameter setting, and then cheaply evaluate hyperparameter optimization methods using the model’s performance predictions in lieu of running the real algorithm. We evaluated a wide range of regression techniques, both in terms of how well they predict the performance of new hyperparameter settings and in terms of the quality of surrogate benchmarks obtained. We found that tree-based models capture the performance of several machine learning algorithms well and yield surrogate benchmarks that closely resemble real-world benchmarks, while being much easier to use and orders of magnitude cheaper to evaluate.

AAAI Conference 2015 Conference Paper

Initializing Bayesian Hyperparameter Optimization via Meta-Learning

  • Matthias Feurer
  • Jost Springenberg
  • Frank Hutter

Model selection and hyperparameter optimization is crucial in applying machine learning to a novel dataset. Recently, a subcommunity of machine learning has focused on solving this problem with Sequential Model-based Bayesian Optimization (SMBO), demonstrating substantial successes in many applications. However, for computationally expensive algorithms the overhead of hyperparameter optimization can still be prohibitive. In this paper we mimic a strategy human domain experts use: speed up optimization by starting from promising configurations that performed well on similar datasets. The resulting initialization technique integrates naturally into the generic SMBO framework and can be trivially applied to any SMBO method. To validate our approach, we perform extensive experiments with two established SMBO frameworks (Spearmint and SMAC) with complementary strengths; optimizing two machine learning frameworks on 57 datasets. Our initialization procedure yields mild improvements for lowdimensional hyperparameter optimization and substantially improves the state of the art for the more complex combined algorithm selection and hyperparameter optimization problem.

IJCAI Conference 2015 Conference Paper

On the Effective Configuration of Planning Domain Models

  • Mauro Vallati
  • Frank Hutter
  • Lukas Chrpa
  • Thomas Leo McCluskey

The development of domain-independent planners within the AI Planning community is leading to “off the shelf” technology that can be used in a wide range of applications. Moreover, it allows a modular approach – in which planners and domain knowledge are modules of larger software applications – that facilitates substitutions or improvements of individual modules without changing the rest of the system. This approach also supports the use of reformulation and configuration techniques, which transform how a model is represented in order to improve the efficiency of plan generation. In this paper, we investigate how the performance of planners is affected by domain model configuration. We introduce a fully automated method for this configuration task, and show in an extensive experimental analysis with six planners and seven domains that this process (which can, in principle, be combined with other forms of reformulation and configuration) can have a remarkable impact on performance across planners. Furthermore, studying the obtained domain model configurations can provide useful information to effectively engineer planning domain models.

IJCAI Conference 2015 Conference Paper

Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

  • Tobias Domhan
  • Jost Tobias Springenberg
  • Frank Hutter

Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more computational resources than human experts. Humans have one advantage: when they evaluate a poor hyperparameter setting they can quickly detect (after a few steps of stochastic gradient descent) that the resulting network performs poorly and terminate the corresponding evaluation to save time. In this paper, we mimic the early termination of bad runs using a probabilistic model that extrapolates the performance from the first part of a learning curve. Experiments with a broad range of neural network architectures on various prominent object recognition benchmarks show that our resulting approach speeds up state-of-the-art hyperparameter optimization methods for DNNs roughly twofold, enabling them to find DNN settings that yield better performance than those chosen by human experts.

SAT Conference 2015 Conference Paper

SpySMAC: Automated Configuration and Performance Analysis of SAT Solvers

  • Stefan Falkner
  • Marius Lindauer
  • Frank Hutter

Abstract Most modern SAT solvers expose a range of parameters to allow some customization for improving performance on specific types of instances. Performing this customization manually can be challenging and time-consuming, and as a consequence several automated algorithm configuration methods have been developed for this purpose. Although automatic algorithm configuration has already been applied successfully to many different SAT solvers, a comprehensive analysis of the configuration process is usually not readily available to users. Here, we present SpySMAC to address this gap by providing a lightweight and easy-to-use toolbox for (i) automatic configuration of SAT solvers in different settings, (ii) a thorough performance analysis comparing the best found configuration to the default one, and (iii) an assessment of each parameter’s importance using the fANOVA framework. To showcase our tool, we apply it to Lingeling and probSAT, two state-of-the-art solvers with very different characteristics.

ICML Conference 2014 Conference Paper

An Efficient Approach for Assessing Hyperparameter Importance

  • Frank Hutter
  • Holger H. Hoos
  • Kevin Leyton-Brown

The performance of many machine learning methods depends critically on hyperparameter settings. Sophisticated Bayesian optimization methods have recently achieved considerable successes in optimizing these hyperparameters, in several cases surpassing the performance of human experts. However, blind reliance on such methods can leave end users without insight into the relative importance of different hyperparameters and their interactions. This paper describes efficient methods that can be used to gain such insight, leveraging random forest models fit on the data already gathered by Bayesian optimization. We first introduce a novel, linear-time algorithm for computing marginals of random forest predictions and then show how to leverage these predictions within a functional ANOVA framework, to quantify the importance of both single hyperparameters and of interactions between hyperparameters. We conducted experiments with prominent machine learning frameworks and state-of-the-art solvers for combinatorial problems. We show that our methods provide insight into the relationship between hyperparameter settings and performance, and demonstrate that—even in very high-dimensional cases—most performance variation is attributable to just a few hyperparameters.

ICAPS Conference 2014 Conference Paper

Improved Features for Runtime Prediction of Domain-Independent Planners

  • Chris Fawcett
  • Mauro Vallati
  • Frank Hutter
  • Jörg Hoffmann 0001
  • Holger H. Hoos
  • Kevin Leyton-Brown

State-of-the-art planners often exhibit substantial runtime variation, making it useful to be able to efficiently predict how long a given planner will take to run on a given instance. In other areas of AI, such needs are met by building so-called empirical performance models (EPMs), statistical models derived from sets of problem instances and performance observations. Historically, such models have been less accurate for predicting the running times of planners. A key hurdle has been a relative weakness in instance features for characterizing instances: mappings from problem instances to real numbers that serve as the starting point for learning an EPM. We propose a new, extensive set of instance features for planning, and investigate its effectiveness across a range of model families. We built EPMs for various prominent planning systems on several thousand benchmark problems from the planning literature and from IPC benchmark sets, and conclude that our models predict runtime much more accurately than the previous state of the art. We also study the relative importance of these features.

IJCAI Conference 2013 Conference Paper

Bayesian Optimization in High Dimensions via Random Embeddings

  • Ziyu Wang
  • Masrour Zoghi
  • Frank Hutter
  • David Matheson
  • Nando de Freitas

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve high-dimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.

SAT Conference 2012 Conference Paper

Evaluating Component Solver Contributions to Portfolio-Based Algorithm Selectors

  • Lin Xu
  • Frank Hutter
  • Holger H. Hoos
  • Kevin Leyton-Brown

Abstract Portfolio-based methods exploit the complementary strengths of a set of algorithms and—as evidenced in recent competitions—represent the state of the art for solving many NP-hard problems, including SAT. In this work, we argue that a state-of-the-art method for constructing portfolio-based algorithm selectors, \(\texttt{SATzilla}\), also gives rise to an automated method for quantifying the importance of each of a set of available solvers. We entered a substantially improved version of \(\texttt{SATzilla}\) to the inaugural “analysis track” of the 2011 SAT competition, and draw two main conclusions from the results that we obtained. First, automatically-constructed portfolios of sequential, non-portfolio competition entries perform substantially better than the winners of all three sequential categories. Second, and more importantly, a detailed analysis of these portfolios yields valuable insights into the nature of successful solver designs in the different categories. For example, we show that the solvers contributing most to \(\texttt{SATzilla}\) were often not the overall best-performing solvers, but instead solvers that exploit novel solution strategies to solve instances that would remain unsolved without them.

AAAI Conference 2007 Conference Paper

Automatic Algorithm Configuration Based on Local Search

  • Frank Hutter

The determination of appropriate values for free algorithm parameters is a challenging and tedious task in the design of effective algorithms for hard problems. Such parameters include categorical choices (e. g. , neighborhood structure in local search or variable/value ordering heuristics in tree search), as well as numerical parameters (e. g. , noise or restart timing). In practice, tuning of these parameters is largely carried out manually by applying rules of thumb and crude heuristics, while more principled approaches are only rarely used. In this paper, we present a local search approach for algorithm configuration and prove its convergence to the globally optimal parameter configuration. Our approach is very versatile: it can, e. g. , be used for minimising run-time in decision problems or for maximising solution quality in optimisation problems. It further applies to arbitrary algorithms, including heuristic tree search and local search algorithms, with no limitation on the number of parameters. Experiments in four algorithm configuration scenarios demonstrate that our automatically determined parameter settings always outperform the algorithm defaults, sometimes by several orders of magnitude. Our approach also shows better performance and greater flexibility than the recent CALIBRA system. Our ParamILS code, along with instructions on how to use it for tuning your own algorithms, is available on-line at http: //www. cs. ubc. ca/labs/beta/Projects/ParamILS.

IJCAI Conference 2005 Conference Paper

Efficient Stochastic Local Search for MPE Solving

  • Frank Hutter
  • Holger H. Hoos
  • Thomas Stützle

Finding most probable explanations (MPEs) in graphical models, such as Bayesian belief networks, is a fundamental problem in reasoning under uncertainty, and much effort has been spent on developing effective algorithms for this NP-hard problem. Stochastic local search (SLS) approaches to MPE solving have previously been explored, but were found to be not competitive with state-of-theart branch & bound methods. In this work, we identify the shortcomings of earlier SLS algorithms for the MPE problem and demonstrate how these can be overcome, leading to an SLS algorithm that substantially improves the state-of-the-art in solving hard networks with many variables, large domain sizes, high degree, and, most importantly, networks with high induced width.