Arrow Research search

Author name cluster

Martin Wistuba

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

ICML Conference 2025 Conference Paper

Hyperband-based Bayesian Optimization for Black-box Prompt Selection

  • Lennart Schneider
  • Martin Wistuba
  • Aaron Klein
  • Jacek Golebiowski
  • Giovanni Zappella
  • Felice Antonio Merra

Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to potentially large, combinatorial search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set. We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler to efficiently select prompts. HbBoPs uses embeddings of instructions and few-shot exemplars, treating them as modular components within prompts. This enhances the surrogate model’s ability to predict which prompt to evaluate next in a sample-efficient manner. Hyperband improves query-efficiency by adaptively allocating resources across different fidelity levels, reducing the number of validation instances required for evaluating prompts. Extensive experiments across ten diverse benchmarks and three LLMs demonstrate that HbBoPs outperforms state-of-the-art methods in both performance and efficiency.

ICLR Conference 2023 Conference Paper

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

  • Ondrej Bohdal
  • Lukas Balles
  • Martin Wistuba
  • Beyza Ermis
  • Cédric Archambeau
  • Giovanni Zappella

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, extends ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA.

NeurIPS Conference 2023 Conference Paper

Scaling Laws for Hyperparameter Optimization

  • Arlind Kadra
  • Maciej Janowski
  • Martin Wistuba
  • Josif Grabocka

Hyperparameter optimization is an important subfield of machine learning that focuses on tuning the hyperparameters of a chosen algorithm to achieve peak performance. Recently, there has been a stream of methods that tackle the issue of hyperparameter optimization, however, most of the methods do not exploit the dominant power law nature of learning curves for Bayesian optimization. In this work, we propose Deep Power Laws (DPL), an ensemble of neural network models conditioned to yield predictions that follow a power-law scaling pattern. Our method dynamically decides which configurations to pause and train incrementally by making use of gray-box evaluations. We compare our method against 7 state-of-the-art competitors on 3 benchmarks related to tabular, image, and NLP datasets covering 59 diverse tasks. Our method achieves the best results across all benchmarks by obtaining the best any-time results compared to all competitors.

AAAI Conference 2022 Conference Paper

Bandit Limited Discrepancy Search and Application to Machine Learning Pipeline Optimization

  • Akihiro Kishimoto
  • Djallel Bouneffouf
  • Radu Marinescu
  • Parikshit Ram
  • Ambrish Rawat
  • Martin Wistuba
  • Paulito Palmes
  • Adi Botea

Optimizing a machine learning (ML) pipeline has been an important topic of AI and ML. Despite recent progress, pipeline optimization remains a challenging problem, due to potentially many combinations to consider as well as slow training and validation. We present the BLDS algorithm for optimized algorithm selection in a fixed ML pipeline structure. BLDS performs multi-fidelity optimization for selecting ML algorithms trained with smaller computational overhead, while controlling its pipeline search based on multi-armed bandit and limited discrepancy search. Our experiments on classification benchmarks show that BLDS is superior to competing algorithms. We also combine BLDS with hyperparameter optimization, empirically showing the advantage of BLDS.

NeurIPS Conference 2022 Conference Paper

Memory Efficient Continual Learning with Transformers

  • Beyza Ermis
  • Giovanni Zappella
  • Martin Wistuba
  • Aditya Rawal
  • Cedric Archambeau

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since compute or data might not be available in sufficiently large quantities to practitioners to train from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

Supervising the Multi-Fidelity Race of Hyperparameter Configurations

  • Martin Wistuba
  • Arlind Kadra
  • Josif Grabocka

Multi-fidelity (gray-box) hyperparameter optimization techniques (HPO) have recently emerged as a promising direction for tuning Deep Learning methods. However, existing methods suffer from a sub-optimal allocation of the HPO budget to the hyperparameter configurations. In this work, we introduce DyHPO, a Bayesian Optimization method that learns to decide which hyperparameter configuration to train further in a dynamic race among all feasible configurations. We propose a new deep kernel for Gaussian Processes that embeds the learning curve dynamics, and an acquisition function that incorporates multi-budget information. We demonstrate the significant superiority of DyHPO against state-of-the-art hyperparameter optimization methods through large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and diverse architectures (MLP, CNN/NAS, RNN).

AAAI Conference 2021 System Paper

AutoText: An End-to-End AutoAI Framework for Text

  • Arunima Chaudhary
  • Alayt Issak
  • Kiran Kate
  • Yannis Katsis
  • Abel Valente
  • Dakuo Wang
  • Alexandre Evfimievski
  • Sairam Gurajada

Building models for natural language processing (NLP) tasks remains a daunting task for many, requiring significant technical expertise, efforts, and resources. In this demonstration, we present AutoText, an end-to-end AutoAI framework for text, to lower the barrier of entry in building NLP models. AutoText combines state-of-the-art AutoAI optimization techniques and learning algorithms for NLP tasks into a single extensible framework. Through its simple, yet powerful UI, non-AI experts (e. g. , domain experts) can quickly generate performant NLP models with support to both control (e. g. , via specifying constraints) and understand learned models.

ICLR Conference 2021 Conference Paper

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

  • Martin Wistuba
  • Josif Grabocka

Hyperparameter optimization (HPO) is a central pillar in the automation of machine learning solutions and is mainly performed via Bayesian optimization, where a parametric surrogate is learned to approximate the black box response function (e.g. validation error). Unfortunately, evaluating the response function is computationally intensive. As a remedy, earlier work emphasizes the need for transfer learning surrogates which learn to optimize hyperparameters for an algorithm from other tasks. In contrast to previous work, we propose to rethink HPO as a few-shot learning problem in which we train a shared deep surrogate model to quickly adapt (with few response evaluations) to the response function of a new task. We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion in order to jointly approximate the response functions of a collection of training data sets. As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO compared to several recent methods on diverse metadata sets.

IJCAI Conference 2021 Conference Paper

Hardware-Aware Neural Architecture Search: Survey and Taxonomy

  • Hadjer Benmeziane
  • Kaoutar El Maghraoui
  • Hamza Ouarnoughi
  • Smail Niar
  • Martin Wistuba
  • Naigang Wang

There is no doubt that making AI mainstream by bringing powerful, yet power hungry deep neural networks (DNNs) to resource-constrained devices would required an efficient co-design of algorithms, hardware and software. The increased popularity of DNN applications deployed on a wide variety of platforms, from tiny microcontrollers to data-centers, have resulted in multiple questions and challenges related to constraints introduced by the hardware. In this survey on hardware-aware neural architecture search (HW-NAS), we present some of the existing answers proposed in the literature for the following questions: "Is it possible to build an efficient DL model that meets the latency and energy constraints of tiny edge devices? ", "How can we reduce the trade-off between the accuracy of a DL model and its ability to be deployed in a variety of platforms? ". The survey provides a new taxonomy of HW-NAS and assesses the hardware cost estimation strategies. We also highlight the challenges and limitations of existing approaches and potential future directions. We hope that this survey will help to fuel the research towards efficient deep learning.

NeurIPS Conference 2021 Conference Paper

HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML

  • Sebastian Pineda Arango
  • Hadi Jomaa
  • Martin Wistuba
  • Josif Grabocka

Hyperparameter optimization (HPO) is a core problem for the machine learning community and remains largely unsolved due to the significant computational resources required to evaluate hyperparameter configurations. As a result, a series of recent related works have focused on the direction of transfer learning for quickly fine-tuning hyperparameters on a dataset. Unfortunately, the community does not have a common large-scale benchmark for comparing HPO algorithms. Instead, the de facto practice consists of empirical protocols on arbitrary small-scale meta-datasets that vary inconsistently across publications, making reproducibility a challenge. To resolve this major bottleneck and enable a fair and fast comparison of black-box HPO methods on a level playing field, we propose HPO-B, a new large-scale benchmark in the form of a collection of meta-datasets. Our benchmark is assembled and preprocessed from the OpenML repository and consists of 176 search spaces (algorithms) evaluated sparsely on 196 datasets with a total of 6. 4 million hyperparameter evaluations. For ensuring reproducibility on our benchmark, we detail explicit experimental protocols, splits, and evaluation measures for comparing methods for both non-transfer, as well as, transfer learning HPO.

AAAI Conference 2021 Conference Paper

Searching for Machine Learning Pipelines Using a Context-Free Grammar

  • Radu Marinescu
  • Akihiro Kishimoto
  • Parikshit Ram
  • Ambrish Rawat
  • Martin Wistuba
  • Paulito P. Palmes
  • Adi Botea

AutoML automatically selects, composes and parameterizes machine learning algorithms into a workflow or pipeline of operations that aims at maximizing performance on a given dataset. Although current methods for AutoML achieved impressive results they mostly concentrate on optimizing fixed linear workflows. In this paper, we take a different approach and focus on generating and optimizing pipelines of complex directed acyclic graph shapes. These complex pipeline structure may lead to discovering new synthetic features and thus boost performance considerably. We explore the power of heuristic search and context-free grammars to search and optimize these kinds of pipelines. Experiments on various benchmark datasets show that our approach is highly competitive and often outperforms existing AutoML systems.

ICML Conference 2020 Conference Paper

Learning to Rank Learning Curves

  • Martin Wistuba
  • Tejaswini Pedapati

Many automated machine learning methods, such as those for hyperparameter and neural architecture optimization, are computationally expensive because they involve training many different model configurations. In this work, we present a new method that saves computational budget by terminating poor configurations early on in the training. In contrast to existing methods, we consider this task as a ranking and transfer learning problem. We qualitatively show that by optimizing a pairwise ranking loss and leveraging learning curves from other data sets, our model is able to effectively rank learning curves without having to observe many or very long learning curves. We further demonstrate that our method can be used to accelerate a neural architecture search by a factor of up to 100 without a significant performance degradation of the discovered architecture. In further experiments we analyze the quality of ranking, the influence of different model components as well as the predictive behavior of the model.

IJCAI Conference 2019 Conference Paper

Optimal Exploitation of Clustering and History Information in Multi-armed Bandit

  • Djallel Bouneffouf
  • Srinivasan Parthasarathy
  • Horst Samulowitz
  • Martin Wistuba

We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selectionfor latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.