Arrow Research search

Author name cluster

Eyke Hüllermeier

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

89 papers
2 author rows

Possible papers

89

AAAI Conference 2026 Conference Paper

Fine-grained Uncertainty Decomposition in Large Language Models: A Spectral Approach

  • Nassim Walha
  • Sebastian G. Gruber
  • Thomas Decker
  • Yinchong Yang
  • Alireza Javanmardi
  • Eyke Hüllermeier
  • Florian Buettner

As Large Language Models (LLMs) are increasingly integrated in diverse applications, obtaining reliable measures of their predictive uncertainty has become critically important. A precise distinction between aleatoric uncertainty, arising from inherent ambiguities within input data, and epistemic uncertainty, originating exclusively from model limitations, is essential to effectively address each uncertainty source. In this paper, we introduce Spectral Uncertainty, a novel approach to quantifying and decomposing uncertainties in LLMs. Leveraging the Von Neumann entropy from quantum information theory, Spectral Uncertainty provides a rigorous theoretical foundation for separating total uncertainty into distinct aleatoric and epistemic components. Unlike existing baseline methods, our approach incorporates a fine-grained representation of semantic similarity, enabling nuanced differentiation among various semantic interpretations in model responses. Empirical evaluations demonstrate that Spectral Uncertainty outperforms state-of-the-art methods in estimating both aleatoric and total uncertainty across diverse models and benchmark datasets.

AAAI Conference 2026 Conference Paper

Shapley Value Approximation Based on k-Additive Games

  • Guilherme Dean Pelegrina
  • Patrick Kolpaczki
  • Eyke Hüllermeier

The Shapley value is the prevalent solution for fair division problems in which a payout is to be divided among multiple agents. By adopting a game-theoretic view, the idea of fair division and the Shapley value can also be used in machine learning to quantify the individual contribution of features or data points to the performance of a predictive model.Despite its popularity and axiomatic justification, the Shapley value suffers from a computational complexity that scales exponentially with the number of entities involved, and hence requires approximation methods for its reliable estimation. We propose SVAkADD, a novel approximation method that fits a k-additive surrogate game. By taking advantage of k-additivity, we are able to elicit the exact Shapley values of the surrogate game and then use these values as estimates for the original fair division problem. The efficacy of our method is evaluated empirically and compared to competing methods.

AAAI Conference 2026 Conference Paper

Uncertainty Quantification for Machine Learning: One Size Does Not Fit All

  • Paul Hofman
  • Yusuf Sale
  • Eyke Hüllermeier

Proper quantification of predictive uncertainty is essential for the use of machine learning in safety-critical applications. Various uncertainty measures have been proposed for this purpose, typically claiming superiority over other measures. In this paper, we argue that there is no single best measure. Instead, uncertainty quantification should be tailored to the specific application. To this end, we use a flexible family of uncertainty measures that distinguishes between total, aleatoric, and epistemic uncertainty of second-order distributions. These measures can be instantiated with specific loss functions, so-called proper scoring rules, to control their characteristics, and we show that different characteristics are useful for different tasks. In particular, we show that, for the task of selective prediction, the scoring rule should ideally match the task loss. On the other hand, for out-of-distribution detection, our results confirm that mutual information, a widely used measure of epistemic uncertainty, performs best. Furthermore, in an active learning setting, epistemic uncertainty based on zero-one loss is shown to consistently outperform other uncertainty measures.

TMLR Journal 2025 Journal Article

A Survey of Reinforcement Learning from Human Feedback

  • Timo Kaufmann
  • Paul Weng
  • Viktor Bengs
  • Eyke Hüllermeier

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning provides a promising approach to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The success in training large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF has played a decisive role in directing the model's capabilities towards human objectives. This article provides an overview of the fundamentals of RLHF, exploring how RL agents interact with human feedback. While recent focus has been on RLHF for LLMs, our survey covers the technique across multiple domains. We provide our most comprehensive coverage in control and robotics, where many fundamental techniques originate, alongside a dedicated LLM section. We examine the core principles that underpin RLHF, how algorithms and human feedback work together, and discuss the main research trends in the field. Our goal is to give researchers and practitioners a clear understanding of this rapidly growing field.

NeurIPS Conference 2025 Conference Paper

Adjusted Count Quantification Learning on Graphs

  • Clemens Damke
  • Eyke Hüllermeier

Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not applicable to graph quantification problems. To address this issue, we propose structural importance sampling (SIS), the first graph quantification method that is applicable under (structural) covariate shift. Additionally, we propose Neighborhood-aware ACC, which improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.

ICML Conference 2025 Conference Paper

Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries

  • Xuening Feng
  • Zhao-Hui Jiang
  • Timo Kaufmann
  • Eyke Hüllermeier
  • Paul Weng
  • Yifei Zhu

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains where objectives are hard to formalize. However, traditional methods based on pairwise trajectory comparisons face notable challenges, including the difficulty in comparing trajectories with subtle differences and the limitation of conveying only ordinal information, limiting direct inference of preference strength. In this paper, we introduce a novel distinguishability query, enabling humans to express preference strength by comparing two pairs of trajectories. Labelers first indicate which of two pairs is easier to distinguish, then provide preference feedback only on the easier pair. Our proposed query type directly captures preference strength and is expected to reduce the cognitive load on the labeler. We further connect this query to cardinal utility and difference relations and develop an efficient query selection scheme to achieve a better trade-off between query informativeness and easiness. Experimental results demonstrate the potential of our method for faster, data-efficient learning and improved user-friendliness in RLHF benchmarks, particularly in classical control settings where preference strength is critical for expected utility maximization.

UAI Conference 2025 Conference Paper

Conformal Prediction without Nonconformity Scores

  • Jonas Hanselle
  • Alireza Javanmardi
  • Tobias Florin Oberkofler
  • Yusuf Sale
  • Eyke Hüllermeier

Conformal prediction (CP) is an uncertainty quantification framework that allows for constructing statistically valid prediction sets. Key to the construction of these sets is the notion of a nonconformity function, which assigns a real-valued score to individual data points: only those (hypothetical) data points contribute to a prediction set that sufficiently conform to the data. The point of departure of this work is the observation that CP predictions are invariant against (strictly) monotone transformations of the nonconformity function. In other words, it is only the ordering of the scores that matters, not their quantitative values. Consequently, instead of scoring individual data points, a conformal predictor only needs to be able to compare pairs of data points, deciding which of them is the more conforming one. This suggests an interesting connection between CP and preference learning, in particular learning-to-rank methods, and makes CP amenable to training data in the form of (qualitative) preferences. Elaborating on this connection, we propose methods for preference-based CP and show their usefulness in real-world classification tasks.

NeurIPS Conference 2025 Conference Paper

Credal Prediction based on Relative Likelihood

  • Timo Löhr
  • Paul Hofman
  • Felix Mohr
  • Eyke Hüllermeier

Predictions in the form of sets of probability distributions, so-called credal sets, provide a suitable means to represent a learner's epistemic uncertainty. In this paper, we propose a theoretically grounded approach to credal prediction based on the statistical notion of relative likelihood: The target of prediction is the set of all (conditional) probability distributions produced by the collection of plausible models, namely those models whose relative likelihood exceeds a specified threshold. This threshold has an intuitive interpretation and allows for controlling the trade-off between correctness and precision of credal predictions. We tackle the problem of approximating credal sets defined in this way by means of suitably modified ensemble learning techniques. To validate our approach, we illustrate its effectiveness by experiments on benchmark datasets demonstrating superior uncertainty representation without compromising predictive performance. We also compare our method against several state-of-the-art baselines in credal prediction.

AAAI Conference 2025 Conference Paper

DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback

  • Xuening Feng
  • Zhaohui Jiang
  • Timo Kaufmann
  • Puchen Xu
  • Eyke Hüllermeier
  • Paul Weng
  • Yifei Zhu

Defining a reward function is usually a challenging but critical task for the system designer in reinforcement learning, especially when specifying complex behaviors. Reinforcement learning from human feedback (RLHF) emerges as a promising approach to circumvent this. In RLHF, the agent typically learns a reward function by querying a human teacher using pairwise comparisons of trajectory segments. A key question in this domain is how to reduce the number of queries necessary to learn an informative reward function since asking a human teacher too many queries is impractical and costly. To tackle this question, we propose DUO, a novel method for diverse, uncertain, on-policy query generation and selection in RLHF. Our method produces queries that are (1) more relevant for policy training (via an on-policy criterion), (2) more informative (via a principled measure of epistemic uncertainty), and (3) diverse (via a clustering-based filter). Experimental results on a variety of locomotion and robotic manipulation tasks demonstrate that our method can outperform state-of-the-art RLHF methods given the same total budget of queries, while being robust to possibly irrational teachers.

ICLR Conference 2025 Conference Paper

Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks

  • Maximilian Muschalik
  • Fabian Fumagalli
  • Paolo Frazzetto
  • Janine Strotherm
  • Luca Hermes
  • Alessandro Sperduti
  • Eyke Hüllermeier
  • Barbara Hammer

Albeit the ubiquitous use of Graph Neural Networks (GNNs) in machine learning (ML) prediction tasks involving graph-structured data, their interpretability remains challenging. In explainable artificial intelligence (XAI), the Shapley Value (SV) is the predominant method to quantify contributions of individual features to a ML model’s output. Addressing the limitations of SVs in complex prediction models, Shapley Interactions (SIs) extend the SV to groups of features. In this work, we explain single graph predictions of GNNs with SIs that quantify node contributions and interactions among multiple nodes. By exploiting the GNN architecture, we show that the structure of interactions in node embeddings are preserved for graph prediction. As a result, the exponential complexity of SIs depends only on the receptive fields, i.e. the message-passing ranges determined by the connectivity of the graph and the number of convolutional layers. Based on our theoretical results, we introduce GraphSHAP-IQ, an efficient approach to compute any-order SIs exactly. GraphSHAP-IQ is applicable to popular message passing techniques in conjunction with a linear global pooling and output layer. We showcase that GraphSHAP-IQ substantially reduces the exponential complexity of computing exact SIs on multiple benchmark datasets. Beyond exact computation, we evaluate GraphSHAP-IQ’s approximation of SIs on popular GNN architectures and compare with existing baselines. Lastly, we visualize SIs of real-world water distribution networks and molecule structures using a SI-Graph.

NeurIPS Conference 2025 Conference Paper

Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

  • Hubert Baniecki
  • Maximilian Muschalik
  • Fabian Fumagalli
  • Barbara Hammer
  • Eyke Hüllermeier
  • Przemyslaw Biecek

Language-image pre-training (LIP) enables the development of vision-language models capable of zero-shot classification, localization, multimodal retrieval, and semantic understanding. Various explanation methods have been proposed to visualize the importance of input image-text pairs on the model's similarity outputs. However, popular saliency maps are limited by capturing only first-order attributions, overlooking the complex cross-modal interactions intrinsic to such encoders. We introduce faithful interaction explanations of LIP models (FIxLIP) as a unified approach to decomposing the similarity in vision-language encoders. FIxLIP is rooted in game theory, where we analyze how using the weighted Banzhaf interaction index offers greater flexibility and improves computational efficiency over the Shapley interaction quantification framework. From a practical perspective, we propose how to naturally extend explanation evaluation metrics, such as the pointing game and area between the insertion/deletion curves, to second-order interaction explanations. Experiments on the MS COCO and ImageNet-1k benchmarks validate that second-order methods, such as FIxLIP, outperform first-order attribution methods. Beyond delivering high-quality explanations, we demonstrate the utility of FIxLIP in comparing different models, e. g. CLIP vs. SigLIP-2.

ICLR Conference 2025 Conference Paper

Inverse Constitutional AI: Compressing Preferences into Principles

  • Arduin Findeis
  • Timo Kaufmann
  • Eyke Hüllermeier
  • Samuel Albanie
  • Robert D. Mullins

Feedback data is widely used for fine-tuning and evaluating state-of-the-art AI models. Pairwise text preferences, where human or AI annotators select the “better” of two options, are particularly common. Such preferences are used to train (reward) models or to rank models with aggregate statistics. For many applications it is desirable to understand annotator preferences in addition to modelling them – not least because extensive prior work has shown various unintended biases in preference datasets. Yet, preference datasets remain challenging to interpret. Neither black-box reward models nor statistics can answer why one text is preferred over another. Manual interpretation of the numerous (long) response pairs is usually equally infeasible. In this paper, we introduce the Inverse Constitutional AI (ICAI) problem, formulating the interpretation of pairwise text preference data as a compression task. In constitutional AI, a set of principles (a constitution) is used to provide feedback and fine-tune AI models. ICAI inverts this process: given a feedback dataset, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding ICAI algorithm and validate its generated constitutions quantitatively based on annotation reconstruction accuracy on several datasets: (a) synthetic feedback data with known principles; (b) AlpacaEval cross-annotated human feedback data; (c) crowdsourced Chatbot Arena data; and (d) PRISM data from diverse demographic groups. As an example application, we further demonstrate the detection of biases in human feedback data. As a short and interpretable representation of the original dataset, generated constitutions have many potential use cases: they may help identify undesirable annotator biases, better understand model performance, scale feedback to unseen data, or assist with adapting AI models to individual user or group preferences. We release the source code for our algorithm and experiments at https://github.com/rdnfn/icai.

NeurIPS Conference 2025 Conference Paper

ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning

  • Timo Kaufmann
  • Yannick Metz
  • Daniel Keim
  • Eyke Hüllermeier

Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grapes, but which preference is stronger? Strength is crucial for decision-making under uncertainty and generalization of preference models, but hard to measure reliably. Metadata such as response times and inter-annotator agreement can serve as proxies for strength, but are often noisy and confounded. We propose ResponseRank to address the challenge of learning from noisy strength signals. Our method uses relative differences in proxy signals to rank responses to pairwise comparisons by their inferred preference strength. To control for systemic variation, we compare signals only locally within carefully constructed strata. This enables robust learning of utility differences consistent with strength-derived rankings while making minimal assumptions about the strength signal. Our contributions are threefold: (1) ResponseRank, a novel method that robustly learns preference strength by leveraging locally valid relative strength signals; (2) empirical evidence of improved sample efficiency and robustness across diverse tasks: synthetic preference learning (with simulated response times), language modeling (with annotator agreement), and RL control tasks (with simulated episode returns); and (3) the Pearson Distance Correlation (PDC), a novel metric that isolates cardinal utility learning from ordinal accuracy.

ICML Conference 2025 Conference Paper

X-Hacking: The Threat of Misguided AutoML

  • Rahul Sharma
  • Sumantrak Mukherjee
  • Andrea Sipka
  • Eyke Hüllermeier
  • Sebastian J. Vollmer
  • Sergey Redyuk
  • David Antony Selby

Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how easily an automated machine learning pipeline can be adapted to exploit model multiplicity at scale: searching a set of defensible’ models with similar predictive performance to find a desired explanation. We formulate the trade-off between explanation and accuracy as a multi-objective optimisation problem, and illustrate empirically on familiar real-world datasets that, on average, Bayesian optimisation accelerates X-hacking 3-fold for features susceptible to it, versus random sampling. We show the vulnerability of a dataset to X-hacking can be determined by information redundancy among features. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI.

AAAI Conference 2024 Conference Paper

Approximating the Shapley Value without Marginal Contributions

  • Patrick Kolpaczki
  • Viktor Bengs
  • Maximilian Muschalik
  • Eyke Hüllermeier

The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

IJCAI Conference 2024 Conference Paper

Best Arm Identification with Retroactively Increased Sampling Budget for More Resource-Efficient HPO

  • Jasmin Brandt
  • Marcel Wever
  • Viktor Bengs
  • Eyke Hüllermeier

Hyperparameter optimization (HPO) is indispensable for achieving optimal performance in machine learning tasks. A popular class of methods in this regard is based on Successive Halving (SHA), which casts HPO into a pure-exploration multi-armed bandit problem under finite sampling budget constraints. This is accomplished by considering hyperparameter configurations as arms and rewards as the negative validation losses. While enjoying theoretical guarantees as well as working well in practice, SHA comes, however, with several hyperparameters itself, one of which is the maximum budget that can be allocated to evaluate a single arm (hyperparameter configuration). Although there are already solutions to this meta hyperparameter optimization problem, such as the doubling trick or asynchronous extensions of SHA, these are either practically inefficient or lack theoretical guarantees. In this paper, we propose incremental SHA (iSHA), a synchronous extension of SHA, allowing to increase the maximum budget a posteriori while still enjoying theoretical guarantees. Our empirical analysis of HPO problems corroborates our theoretical findings and shows that iSHA is more resource-efficient than existing SHA-based approaches.

AAAI Conference 2024 Conference Paper

Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles

  • Maximilian Muschalik
  • Fabian Fumagalli
  • Barbara Hammer
  • Eyke Hüllermeier

While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.

NeurIPS Conference 2024 Conference Paper

Conformalized Credal Set Predictors

  • Alireza Javanmardi
  • David Stutz
  • Eyke Hüllermeier

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular, due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method on ambiguous classification tasks for uncertainty quantification.

TMLR Journal 2024 Journal Article

Continual Learning: Applications and the Road Forward

  • Eli Verwimp
  • Rahaf Aljundi
  • Shai Ben-David
  • Matthias Bethge
  • Andrea Cossu
  • Alexander Gepperth
  • Tyler L. Hayes
  • Eyke Hüllermeier

Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they might seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model editing, personalization and specialization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.

ICML Conference 2024 Conference Paper

Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?

  • Mira Jürgens
  • Nis Meinert
  • Viktor Bengs
  • Eyke Hüllermeier
  • Willem Waegeman

Trustworthy ML systems should not only return accurate predictions, but also a reliable representation of their uncertainty. Bayesian methods are commonly used to quantify both aleatoric and epistemic uncertainty, but alternative approaches, such as evidential deep learning methods, have become popular in recent years. The latter group of methods in essence extends empirical risk minimization (ERM) for predicting second-order probability distributions over outcomes, from which measures of epistemic (and aleatoric) uncertainty can be extracted. This paper presents novel theoretical insights of evidential deep learning, highlighting the difficulties in optimizing second-order loss functions and interpreting the resulting epistemic uncertainty measures. With a systematic setup that covers a wide range of approaches for classification, regression and counts, it provides novel insights into issues of identifiability and convergence in second-order loss minimization, and the relative (rather than absolute) nature of epistemic uncertainty measures.

ICML Conference 2024 Conference Paper

KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

  • Fabian Fumagalli
  • Maximilian Muschalik
  • Patrick Kolpaczki
  • Eyke Hüllermeier
  • Barbara Hammer

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and k-Shapley values (k-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

UAI Conference 2024 Conference Paper

Label-wise Aleatoric and Epistemic Uncertainty Quantification

  • Yusuf Sale
  • Paul Hofman
  • Timo Löhr
  • Lisa Wimmer
  • Thomas Nagler
  • Eyke Hüllermeier

We present a novel approach to uncertainty quantification in classification tasks based on label-wise decomposition of uncertainty measures. This label-wise perspective allows uncertainty to be quantified at the individual class level, thereby improving cost-sensitive decision-making and helping understand the sources of uncertainty. Furthermore, it allows to define total, aleatoric, and epistemic uncertainty on the basis of non-categorical measures such as variance, going beyond common entropy-based measures. In particular, variance-based measures address some of the limitations associated with established methods that have recently been discussed in the literature. We show that our proposed measures adhere to a number of desirable properties. Through empirical evaluation on a variety of benchmark data sets – including applications in the medical domain where accurate uncertainty quantification is crucial – we establish the effectiveness of label-wise uncertainty quantification.

UAI Conference 2024 Conference Paper

Linear Opinion Pooling for Uncertainty Quantification on Graphs

  • Clemens Damke
  • Eyke Hüllermeier

We address the problem of uncertainty quantification for graph-structured data, or, more specifically, the problem to quantify the predictive uncertainty in (semi-supervised) node classification. Key questions in this regard concern the distinction between two different types of uncertainty, aleatoric and epistemic, and how to support uncertainty quantification by leveraging the structural information provided by the graph topology. Challenging assumptions and postulates of state-of-the-art methods, we propose a novel approach that represents (epistemic) uncertainty in terms of mixtures of Dirichlet distributions and refers to the established principle of linear opinion pooling for propagating information between neighbored nodes in the graph. The effectiveness of this approach is demonstrated in a series of experiments on a variety of graph-structured datasets.

AAAI Conference 2024 Conference Paper

Mitigating Label Noise through Data Ambiguation

  • Julian Lienen
  • Eyke Hüllermeier

Label noise poses an important challenge in machine learning, especially in deep learning, in which large models with high expressive power dominate the field. Models of that kind are prone to memorizing incorrect labels, thereby harming generalization performance. Many methods have been proposed to address this problem, including robust loss functions and more complex label correction approaches. Robust loss functions are appealing due to their simplicity, but typically lack flexibility, while label correction usually adds substantial complexity to the training setup. In this paper, we suggest to address the shortcomings of both methodologies by "ambiguating" the target information, adding additional, complementary candidate labels in case the learner is not sufficiently convinced of the observed training label. More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold, which deliver imprecise yet more reliable beliefs about the ground-truth, effectively helping the learner to suppress the memorization effect. In an extensive empirical evaluation, our method demonstrates favorable learning behavior on synthetic and real-world noise, confirming the effectiveness in detecting and correcting erroneous training labels.

TMLR Journal 2024 Journal Article

Piecewise-Stationary Dueling Bandits

  • Patrick Kolpaczki
  • Eyke Hüllermeier
  • Viktor Bengs

We study the piecewise-stationary dueling bandits problem with $K$ arms, where the time horizon $T$ consists of $M$ stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the Beat the Winner Reset algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of $M$ or $T$. We further propose and analyze two meta-algorithms, DETECT for weak regret and Monitored Dueling Bandits for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.

ICML Conference 2024 Conference Paper

Position: Why We Must Rethink Empirical Research in Machine Learning

  • Moritz Herrmann
  • F. Julian D. Lange
  • Katharina Eggensperger
  • Giuseppe Casalicchio
  • Marcel Wever
  • Matthias Feurer 0001
  • David Rügamer
  • Eyke Hüllermeier

We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue most current empirical machine learning research is fashioned as confirmatory research while it should rather be considered exploratory.

ICLR Conference 2024 Conference Paper

Probabilistic Self-supervised Representation Learning via Scoring Rules Minimization

  • Amirhossein Vahidi
  • Simon Schoßer
  • Lisa Wimmer
  • Yawei Li
  • Bernd Bischl
  • Eyke Hüllermeier
  • Mina Rezaei

% Self-supervised learning methods have shown promising results across a wide range of tasks in computer vision, natural language processing, and multimodal analysis. However, self-supervised approaches come with a notable limitation, dimensional collapse, where a model doesn't fully utilize its capacity to encode information optimally. Motivated by this, we propose ProSMin, a novel probabilistic self-supervised learning approach that leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks, the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through probabilistic knowledge distillation. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMin and demonstrate its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, outperforming the self-supervised baseline in a variety of experiments on large datasets such as ImageNet-O and ImageNet-C. ProSMin thus demonstrates its scalability and real-world applicability. Our code is publicly available: https://github.com/amirvhd/SSL-sore-rule.

ICML Conference 2024 Conference Paper

Second-Order Uncertainty Quantification: A Distance-Based Approach

  • Yusuf Sale
  • Viktor Bengs
  • Michele Caprio
  • Eyke Hüllermeier

In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i. e. , predictions in the form of distributions on probability distributions. A completely conclusive solution has not yet been found, however, as shown by recent criticisms of commonly used uncertainty measures associated with second-order distributions, identifying undesirable theoretical properties of these measures. In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. Moreover, we provide a general framework for developing uncertainty measures to account for these criteria, and offer an instantiation based on the Wasserstein distance, for which we prove that all criteria are satisfied.

NeurIPS Conference 2024 Conference Paper

shapiq: Shapley Interactions for Machine Learning

  • Maximilian Muschalik
  • Hubert Baniecki
  • Fabian Fumagalli
  • Patrick Kolpaczki
  • Barbara Hammer
  • Eyke Hüllermeier

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research. The source code and documentation are available at https: //github. com/mmschlk/shapiq.

IJCAI Conference 2023 Conference Paper

A Survey of Methods for Automated Algorithm Configuration (Extended Abstract)

  • Elias Schede
  • Jasmin Brandt
  • Alexander Tornede
  • Marcel Wever
  • Viktor Bengs
  • Eyke Hüllermeier
  • Kevin Tierney

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There are currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. Existing AC literature is classified and characterized by the provided taxonomies.

AAAI Conference 2023 Conference Paper

AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration

  • Jasmin Brandt
  • Elias Schede
  • Björn Haddenhorst
  • Viktor Bengs
  • Eyke Hüllermeier
  • Kevin Tierney

We study the algorithm configuration (AC) problem, in which one seeks to find an optimal parameter configuration of a given target algorithm in an automated way. Although this field of research has experienced much progress recently regarding approaches satisfying strong theoretical guarantees, there is still a gap between the practical performance of these approaches and the heuristic state-of-the-art approaches. Recently, there has been significant progress in designing AC approaches that satisfy strong theoretical guarantees. However, a significant gap still remains between the practical performance of these approaches and state-of-the-art heuristic methods. To this end, we introduce AC-Band, a general approach for the AC problem based on multi-armed bandits that provides theoretical guarantees while exhibiting strong practical performance. We show that AC-Band requires significantly less computation time than other AC approaches providing theoretical guarantees while still yielding high-quality configurations.

UAI Conference 2023 Conference Paper

Is the volume of a credal set a good measure for epistemic uncertainty?

  • Yusuf Sale
  • Michele Caprio
  • Eyke Hüllermeier

Adequate uncertainty representation and quantification have become imperative in various scientific disciplines, especially in machine learning and artificial intelligence. As an alternative to representing uncertainty via one single probability measure, we consider credal sets (convex sets of probability measures). The geometric representation of credal sets as d-dimensional polytopes implies a geometric intuition about (epistemic) uncertainty. In this paper, we show that the volume of the geometric representation of a credal set is a meaningful measure of epistemic uncertainty in the case of binary classification, but less so for multi-class classification. Our theoretical findings highlight the crucial role of specifying and employing uncertainty measures in machine learning in an appropriate way, and for being aware of possible pitfalls.

NeurIPS Conference 2023 Conference Paper

Koopman Kernel Regression

  • Petar Bevanda
  • Max Beier
  • Armin Lederer
  • Stefan Sosnowski
  • Eyke Hüllermeier
  • Sandra Hirche

Many machine learning approaches for decision making, such as reinforcement learning, rely on simulators or predictive models to forecast the time-evolution of quantities of interest, e. g. , the state of an agent or the reward of a policy. Forecasts of such complex phenomena are commonly described by highly nonlinear dynamical systems, making their use in optimization-based decision-making challenging. Koopman operator theory offers a beneficial paradigm for addressing this problem by characterizing forecasts via linear time-invariant (LTI) ODEs, turning multi-step forecasts into sparse matrix multiplication. Though there exists a variety of learning approaches, they usually lack crucial learning-theoretic guarantees, making the behavior of the obtained models with increasing data and dimensionality unclear. We address the aforementioned by deriving a universal Koopman-invariant reproducing kernel Hilbert space (RKHS) that solely spans transformations into LTI dynamical systems. The resulting Koopman Kernel Regression (KKR) framework enables the use of statistical learning tools from function approximation for novel convergence results and generalization error bounds under weaker assumptions than existing work. Our experiments demonstrate superior forecasting performance compared to Koopman operator and sequential data predictors in RKHS.

ICLR Conference 2023 Conference Paper

Memorization-Dilation: Modeling Neural Collapse Under Noise

  • Duc Anh Nguyen
  • Ron Levie
  • Julian Lienen
  • Eyke Hüllermeier
  • Gitta Kutyniok

The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the layer-peeled model, in which the network is assumed to have ``infinite expressivity'' and can map each data point to any arbitrary representation. In this work we study a more realistic variant of the layer-peeled model, which takes the positivity of the features into account. Furthermore, we extend this model to also incorporate the limited expressivity of the network. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.

ICML Conference 2023 Conference Paper

On Second-Order Scoring Rules for Epistemic Uncertainty Quantification

  • Viktor Bengs
  • Eyke Hüllermeier
  • Willem Waegeman

It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i. e. , the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.

UAI Conference 2023 Conference Paper

Quantifying aleatoric and epistemic uncertainty in machine learning: Are conditional entropy and mutual information appropriate measures?

  • Lisa Wimmer
  • Yusuf Sale
  • Paul Hofman
  • Bernd Bischl
  • Eyke Hüllermeier

The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, has recently become quite common in machine learning. While the properties of these measures, which are rooted in information theory, seem appealing at first glance, we identify various incoherencies that call their appropriateness into question. In addition to the measures themselves, we critically discuss the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents. Experiments across different computer vision tasks support our theoretical findings and raise concerns about current practice in uncertainty quantification.

NeurIPS Conference 2023 Conference Paper

SHAP-IQ: Unified Approximation of any-order Shapley Interactions

  • Fabian Fumagalli
  • Maximilian Muschalik
  • Patrick Kolpaczki
  • Eyke Hüllermeier
  • Barbara Hammer

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i. e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

JAIR Journal 2023 Journal Article

Towards Green Automated Machine Learning: Status Quo and Future Directions

  • Tanja Tornede
  • Alexander Tornede
  • Jonas Hanselle
  • Felix Mohr
  • Marcel Wever
  • Eyke Hüllermeier

Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution — a machine learning pipeline — tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticized for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large-scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool w.r.t. their “greenness”, i.e., sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community in a more sustainable AutoML research direction. As part of this, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.

JAIR Journal 2022 Journal Article

A Survey of Methods for Automated Algorithm Configuration

  • Elias Schede
  • Jasmin Brandt
  • Alexander Tornede
  • Marcel Wever
  • Viktor Bengs
  • Eyke Hüllermeier
  • Kevin Tierney

Algorithm configuration (AC) is concerned with the automated search of the most suitable parameter configuration of a parametrized algorithm. There is currently a wide variety of AC problem variants and methods proposed in the literature. Existing reviews do not take into account all derivatives of the AC problem, nor do they offer a complete classification scheme. To this end, we introduce taxonomies to describe the AC problem and features of configuration methods, respectively. We review existing AC literature within the lens of our taxonomies, outline relevant design choices of configuration approaches, contrast methods and problem variants against each other, and describe the state of AC in industry. Finally, our review provides researchers and practitioners with a look at future research directions in the field of AC.

NeurIPS Conference 2022 Conference Paper

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

  • Jasmin Brandt
  • Viktor Bengs
  • Björn Haddenhorst
  • Eyke Hüllermeier

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i. e. , the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

AAAI Conference 2022 Conference Paper

Machine Learning for Online Algorithm Selection under Censored Feedback

  • Alexander Tornede
  • Viktor Bengs
  • Eyke Hüllermeier

In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm’s runtime. As the latter is known to exhibit a heavy-tail distribution, an algorithm is normally stopped when exceeding a predefined upper time limit. As a consequence, machine learning methods used to optimize an algorithm selection strategy in a data-driven manner need to deal with right-censored samples, a problem that has received little attention in the literature so far. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. Moreover, we adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon. In an extensive experimental evaluation on an adapted version of the ASlib benchmark, we demonstrate that theoretically well-founded methods based on Thompson sampling perform specifically strong and improve in comparison to existing methods.

NeurIPS Conference 2022 Conference Paper

Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

  • Viktor Bengs
  • Eyke Hüllermeier
  • Willem Waegeman

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.

FormaliSE Conference 2022 Conference Paper

Property-Driven Testing of Black-Box Functions

  • Arnab Sharma
  • Vitalik Melnikov
  • Eyke Hüllermeier
  • Heike Wehrheim

Testing is one of the most frequent means of quality assurance for software. Property-based testing aims at generating test suites for checking code against user-defined properties. Test input generation is, however, most often independent of the property to be checked, and is instead based on random or user-defined data generation. In this paper, we present property-driven unit testing of functions with numerical inputs and outputs. Alike property-based testing, it allows users to define the properties to be tested for. Contrary to property-based testing, it also uses the property for a targeted generation of test inputs. Our approach is a form of learning-based testing where we first of all learn a model of a given black-box function using standard machine learning algorithms, and in a second step use model and property for test input generation. This allows us to test both predefined functions as well as machine learned regression models. Our experimental evaluation shows that our property-driven approach is more effective than standard property-based testing techniques.

UAI Conference 2022 Conference Paper

Quantification of Credal Uncertainty in Machine Learning: A Critical Analysis and Empirical Comparison

  • Eyke Hüllermeier
  • Sébastien Destercke
  • Mohammad Hossein Shaker

The representation and quantification of uncertainty has received increasing attention in machine learning in the recent past. The formalism of credal sets provides an interesting alternative in this regard, especially as it combines the representation of epistemic (lack of knowledge) and aleatoric (statistical) uncertainty in a rather natural way. In this paper, we elaborate on uncertainty measures for credal sets from the perspective of machine learning. More specifically, we provide an overview of proposals, discuss existing measures in a critical way, and also propose a new measure that is more tailored to the machine learning setting. Based on an experimental study, we conclude that theoretically well-justified measures also lead to better performance in practice. Besides, we corroborate the difficulty of the disaggregation problem, that is, of decomposing the amount of total uncertainty into aleatoric and epistemic uncertainty in a sound manner, thereby complementing theoretical findings with empirical evidence.

UAI Conference 2022 Conference Paper

Set-valued prediction in hierarchical classification with constrained representation complexity

  • Thomas Mortier
  • Eyke Hüllermeier
  • Krzysztof Dembczynski
  • Willem Waegeman

Set-valued prediction is a well-known concept in multi-class classification. When a classifier is uncertain about the class label for a test instance, it can predict a set of classes instead of a single class. In this paper, we focus on hierarchical multi-class classification problems, where valid sets (typically) correspond to internal nodes of the hierarchy. We argue that this is a very strong restriction, and we propose a relaxation by introducing the notion of representation complexity for a predicted set. In combination with probabilistic classifiers, this leads to a challenging inference problem for which specific combinatorial optimization algorithms are needed. We propose three methods and evaluate them on benchmark datasets: a naïve approach that is based on matrix-vector multiplication, a reformulation as a knapsack problem with conflict graph, and a recursive tree search method. Experimental results demonstrate that the last method is computationally more efficient than the other two approaches, due to a hierarchical factorization of the conditional class distribution.

ICML Conference 2022 Conference Paper

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

  • Viktor Bengs
  • Aadirupa Saha
  • Eyke Hüllermeier

We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information. We assume that the feedback process is determined by a linear stochastic transitivity model with contextualized utilities (CoLST), and the learner’s task is to include the best arm (with highest latent context-dependent utility) in the duel. We propose a computationally efficient algorithm, \Algo{CoLSTIM}, which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. If each arm is associated with a $d$-dimensional feature vector, we show that \Algo{CoLSTIM} achieves a regret of order $\tilde O( \sqrt{dT})$ after $T$ learning rounds. Additionally, we also establish the optimality of \Algo{CoLSTIM} by showing a lower bound for the weak regret that refines the existing average regret analysis. Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.

NeurIPS Conference 2021 Conference Paper

Credal Self-Supervised Learning

  • Julian Lienen
  • Eyke Hüllermeier

Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate "pseudo-supervision" for unlabeled instances based on its current hypothesis. In combination with consistency regularization, pseudo-labeling has shown promising performance in various domains, for example in computer vision. To account for the hypothetical nature of the pseudo-labels, these are commonly provided in the form of probability distributions. Still, one may argue that even a probability distribution represents an excessive level of informedness, as it suggests that the learner precisely knows the ground-truth conditional probabilities. In our approach, we therefore allow the learner to label instances in the form of credal sets, that is, sets of (candidate) probability distributions. Thanks to this increased expressiveness, the learner is able to represent uncertainty and a lack of knowledge in a more flexible and more faithful manner. To learn from weakly labeled data of that kind, we leverage methods that have recently been proposed in the realm of so-called superset learning. In an exhaustive empirical evaluation, we compare our methodology to state-of-the-art self-supervision approaches, showing competitive to superior performance especially in low-label scenarios incorporating a high degree of uncertainty.

AAAI Conference 2021 Conference Paper

From Label Smoothing to Label Relaxation

  • Julian Lienen
  • Eyke Hüllermeier

Regularization of (deep) learning models can be realized at the model, loss, or data level. As a technique somewhere in-between loss and data, label smoothing turns deterministic class labels into probability distributions, for example by uniformly distributing a certain part of the probability mass over all classes. A predictive model is then trained on these distributions as targets, using cross-entropy as loss function. While this method has shown improved performance compared to non-smoothed cross-entropy, we argue that the use of a smoothed though still precise probability distribution as a target can be questioned from a theoretical perspective. As an alternative, we propose a generalized technique called label relaxation, in which the target is a set of probabilities represented in terms of an upper probability distribution. This leads to a genuine relaxation of the target instead of a distortion, thereby reducing the risk of incorporating an undesirable bias in the learning process. Methodically, label relaxation leads to the minimization of a novel type of loss function, for which we propose a suitable closed-form expression for model optimization. The effectiveness of the approach is demonstrated in an empirical study on image data.

NeurIPS Conference 2021 Conference Paper

Identification of the Generalized Condorcet Winner in Multi-dueling Bandits

  • Björn Haddenhorst
  • Viktor Bengs
  • Eyke Hüllermeier

The reliable identification of the “best” arm while keeping the sample complexity as low as possible is a common task in the field of multi-armed bandits. In the multi-dueling variant of multi-armed bandits, where feedback is provided in the form of a winning arm among as set of k chosen ones, a reasonable notion of best arm is the generalized Condorcet winner (GCW). The latter is an the arm that has the greatest probability of being the winner in each subset containing it. In this paper, we derive lower bounds on the sample complexity for the task of identifying the GCW under various assumptions. As a by-product, our lower bound results provide new insights for the special case of dueling bandits (k = 2). We propose the Dvoretzky–Kiefer–Wolfowitz tournament (DKWT) algorithm, which we prove to be nearly optimal. In a numerical study, we show that DKWT empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.

JAIR Journal 2021 Journal Article

Multilabel Classification with Partial Abstention: Bayes-Optimal Prediction under Label Independence

  • Vu-Linh Nguyen
  • Eyke Hüllermeier

In contrast to conventional (single-label) classification, the setting of multilabel classification (MLC) allows an instance to belong to several classes simultaneously. Thus, instead of selecting a single class label, predictions take the form of a subset of all labels. In this paper, we study an extension of the setting of MLC, in which the learner is allowed to partially abstain from a prediction, that is, to deliver predictions on some but not necessarily all class labels. This option is useful in cases of uncertainty, where the learner does not feel confident enough on the entire label set. Adopting a decision-theoretic perspective, we propose a formal framework of MLC with partial abstention, which builds on two main building blocks: First, the extension of underlying MLC loss functions so as to accommodate abstention in a proper way, and second the problem of optimal prediction, that is, finding the Bayes-optimal prediction minimizing this generalized loss in expectation. It is well known that different (generalized) loss functions may have different risk-minimizing predictions, and finding the Bayes predictor typically comes down to solving a computationally complexity optimization problem. In the most general case, given a prediction of the (conditional) joint distribution of possible labelings, the minimizer of the expected loss needs to be found over a number of candidates which is exponential in the number of class labels. We elaborate on properties of risk minimizers for several commonly used (generalized) MLC loss functions, show them to have a specific structure, and leverage this structure to devise efficient methods for computing Bayes predictors. Experimentally, we show MLC with partial abstention to be effective in the sense of reducing loss when being allowed to abstain.

KR Conference 2021 Conference Paper

On the Identifiability of Hierarchical Decision Models

  • Roman Bresson
  • Johanne Cohen
  • Eyke Hüllermeier
  • Christophe Labreuche
  • Michèle Sebag

Interpretability is a desirable property for machine learning and decision models, particularly in the context of safety-critical applications. Another most desirable property of the sought model is to be unique or {\em identifiable} in the considered class of models: the fact that the same functional dependency can be represented by a number of syntactically different models adversely affects the model interpretability, and prevents the expert from easily checking their validity. This paper focuses on the Choquet integral (CI) models and their hierarchical extensions (HCI). HCIs aim to support expert decision making, by gradually aggregating preferences based on criteria; they are widely used in multi-criteria decision aiding {and are receiving interest from the} Machine Learning {community}, as they preserve the high readability of CIs while efficiently scaling up w. r. t. the number of criteria. The main contribution is to establish the identifiability property of HCI under mild conditions: two HCIs implementing the same aggregation function on the criteria space necessarily have the same hierarchical structure and aggregation parameters. The identifiability property holds even when the marginal utility functions are learned from the data. This makes the class of HCI models a most appropriate choice in domains where the model interpretability and reliability are of primary concern.

JMLR Journal 2021 Journal Article

Preference-based Online Learning with Dueling Bandits: A Survey

  • Viktor Bengs
  • Róbert Busa-Fekete
  • Adil El Mesaoudi-Paul
  • Eyke Hüllermeier

In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available---instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state of the art in this field, referred to as preference-based multi-armed bandits or dueling bandits. To this end, we provide an overview of problems that have been considered in the literature as well as methods for tackling them. Our taxonomy is mainly based on the assumptions made by these methods about the data-generating process and, related to this, the properties of the preference-based feedback. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

AAAI Conference 2021 Conference Paper

Single Player Monte-Carlo Tree Search Based on the Plackett-Luce Model

  • Felix Mohr
  • Viktor Bengs
  • Eyke Hüllermeier

The problem of minimal cost path search is especially difficult when no useful heuristics are available. A common solution is roll-out-based search like Monte Carlo Tree Search (MCTS). However, MCTS is mostly used in stochastic or adversarial environments, with the goal to identify an agent’s best next move. For this reason, even though single player versions of MCTS exist, most algorithms, including UCT, are not directly tailored to classical minimal cost path search. We present Plackett-Luce MCTS (PL-MCTS), a path search algorithm based on a probabilistic model over the qualities of successor nodes. We empirically show that PL-MCTS is competitive and often superior to the state of the art.

UAI Conference 2021 Conference Paper

Testification of Condorcet Winners in dueling bandits

  • Björn Haddenhorst
  • Viktor Bengs
  • Jasmin Brandt
  • Eyke Hüllermeier

Several algorithms for finding the best arm in the dueling bandits setting assume the existence of a Condorcet winner (CW), that is, an arm that uniformly dominates all other arms. Yet, by simply relying on this assumption but not verifying it, such algorithms may produce doubtful results in cases where it actually fails to hold. Even worse, the problem may not be noticed, and an alleged CW still be produced. In this paper, we therefore address the problem as a ”testification” task, by which we mean a combination of testing and identification: The online identification of the CW is combined with the statistical testing of the CW assumption. Thus, instead of returning a supposed CW at some point, the learner has the possibility to stop sampling and refuse an answer in case it feels confident that the CW assumption is violated. Analyzing the testification problem formally, we derive lower bounds on the expected sample complexity of any online algorithm solving it. Moreover, a concrete algorithm is proposed, which achieves the optimal sample complexity up to logarithmic terms.

IJCAI Conference 2020 Conference Paper

Neural Representation and Learning of Hierarchical 2-additive Choquet Integrals

  • Roman Bresson
  • Johanne Cohen
  • Eyke Hüllermeier
  • Christophe Labreuche
  • Michèle Sebag

Multi-Criteria Decision Making (MCDM) aims at modelling expert preferences and assisting decision makers in identifying options best accommodating expert criteria. An instance of MCDM model, the Choquet integral is widely used in real-world applications, due to its ability to capture interactions between criteria while retaining interpretability. Aimed at a better scalability and modularity, hierarchical Choquet integrals involve intermediate aggregations of the interacting criteria, at the cost of a more complex elicitation. The paper presents a machine learning-based approach for the automatic identification of hierarchical MCDM models, composed of 2-additive Choquet integral aggregators and of marginal utility functions on the raw features from data reflecting expert preferences. The proposed NEUR-HCI framework relies on a specific neural architecture, enforcing by design the Choquet model constraints and supporting its end-to-end training. The empirical validation of NEUR-HCI on real-world and artificial benchmarks demonstrates the merits of the approach compared to state-of-art baselines.

ICML Conference 2020 Conference Paper

Preselection Bandits

  • Viktor Bengs
  • Eyke Hüllermeier

In this paper, we introduce the Preselection Bandit problem, in which the learner preselects a subset of arms (choice alternatives) for a user, which then chooses the final arm from this subset. The learner is not aware of the user’s preferences, but can learn them from observed choices. In our concrete setting, we allow these choices to be stochastic and model the user’s actions by means of the Plackett-Luce model. The learner’s main task is to preselect subsets that eventually lead to highly preferred choices. To formalize this goal, we introduce a reasonable notion of regret and derive lower bounds on the expected regret. Moreover, we propose algorithms for which the upper bound on expected regret matches the lower bound up to a logarithmic term of the time horizon.

AAAI Conference 2018 Conference Paper

Learning to Rank Based on Analogical Reasoning

  • Mohsen Ahmadi Fahandar
  • Eyke Hüllermeier

Object ranking or “learning to rank” is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects A, B, C, D, if object A is known to be preferred to B, and C relates to D as A relates to B, then C is (supposedly) preferred to D. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.

ICML Conference 2018 Conference Paper

Ranking Distributions based on Noisy Sorting

  • Adil El Mesaoudi-Paul
  • Eyke Hüllermeier
  • Róbert Busa-Fekete

We propose a new statistical model for ranking data, i. e. , a new family of probability distributions on permutations. Our model is inspired by the idea of a data-generating process in the form of a noisy sorting procedure, in which deterministic comparisons between pairs of items are replaced by Bernoulli trials. The probability of producing a certain ranking as a result then essentially depends on the Bernoulli parameters, which can be interpreted as pairwise preferences. We show that our model can be written in closed form if insertion or quick sort are used as sorting algorithms, and propose a maximum likelihood approach for parameter estimation. We also introduce a generalization of the model, in which the constraints on pairwise preferences are relaxed, and for which maximum likelihood estimation can be carried out based on a variation of the generalized iterative scaling algorithm. Experimentally, we show that the models perform very well in terms of goodness of fit, compared to existing models for ranking data.

IJCAI Conference 2018 Conference Paper

Reliable Multi-class Classification based on Pairwise Epistemic and Aleatoric Uncertainty

  • Vu-Linh Nguyen
  • Sébastien Destercke
  • Marie-Hélène Masson
  • Eyke Hüllermeier

We propose a method for reliable prediction in multi-class classification, where reliability refers to the possibility of partial abstention in cases of uncertainty. More specifically, we allow for predictions in the form of preorder relations on the set of classes, thereby generalizing the idea of set-valued predictions. Our approach relies on combining learning by pairwise comparison with a recent proposal for modeling uncertainty in classification, in which a distinction is made between reducible (a. k. a. epistemic) uncertainty caused by a lack of information and irreducible (a. k. a. aleatoric) uncertainty due to intrinsic randomness. The problem of combining uncertain pairwise predictions into a most plausible preorder is then formalized as an integer programming problem. Experimentally, we show that our method is able to appropriately balance reliability and precision of predictions.

ICML Conference 2017 Conference Paper

Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening

  • Mohsen Ahmadi Fahandar
  • Eyke Hüllermeier
  • Inés Couso

We consider the problem of statistical inference for ranking data, specifically rank aggregation, under the assumption that samples are incomplete in the sense of not comprising all choice alternatives. In contrast to most existing methods, we explicitly model the process of turning a full ranking into an incomplete one, which we call the coarsening process. To this end, we propose the concept of rank-dependent coarsening, which assumes that incomplete rankings are produced by projecting a full ranking to a random subset of ranks. For a concrete instantiation of our model, in which full rankings are drawn from a Plackett-Luce distribution and observations take the form of pairwise preferences, we study the performance of various rank aggregation methods. In addition to predictive accuracy in the finite sample setting, we address the theoretical question of consistency, by which we mean the ability to recover a target ranking when the sample size goes to infinity, despite a potential bias in the observations caused by the (unknown) coarsening.

ICML Conference 2016 Conference Paper

Extreme F-measure Maximization using Sparse Probability Estimates

  • Kalina Jasinska
  • Krzysztof Dembczynski
  • Róbert Busa-Fekete
  • Karlson Pfannschmidt
  • Timo Klerx
  • Eyke Hüllermeier

We consider the problem of (macro) F-measure maximization in the context of extreme multi-label classification (XMLC), i. e. , multi-label classification with extremely large label spaces. We investigate several approaches based on recent results on the maximization of complex performance measures in binary classification. According to these results, the F-measure can be maximized by properly thresholding conditional class probability estimates. We show that a naive adaptation of this approach can be very costly for XMLC and propose to solve the problem by classifiers that efficiently deliver sparse probability estimates (SPEs), that is, probability estimates restricted to the most probable labels. Empirical results provide evidence for the strong practical performance of this approach.

NeurIPS Conference 2015 Conference Paper

Online F-Measure Optimization

  • Róbert Busa-Fekete
  • Balázs Szörényi
  • Krzysztof Dembczynski
  • Eyke Hüllermeier

The F-measure is an important and commonly used performance metric for binary prediction tasks. By combining precision and recall into a single score, it avoids disadvantages of simple metrics like the error rate, especially in cases of imbalanced class distributions. The problem of optimizing the F-measure, that is, of developing learning algorithms that perform optimally in the sense of this measure, has recently been tackled by several authors. In this paper, we study the problem of F-measure maximization in the setting of online learning. We propose an efficient online algorithm and provide a formal analysis of its convergence properties. Moreover, first experimental results are presented, showing that our method performs well in practice.

NeurIPS Conference 2015 Conference Paper

Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach

  • Balázs Szörényi
  • Róbert Busa-Fekete
  • Adil Paul
  • Eyke Hüllermeier

We study the problem of online rank elicitation, assuming that rankings of a set of alternatives obey the Plackett-Luce distribution. Following the setting of the dueling bandits problem, the learner is allowed to query pairwise comparisons between alternatives, i. e. , to sample pairwise marginals of the distribution in an online fashion. Using this information, the learner seeks to reliably predict the most probable ranking (or top-alternative). Our approach is based on constructing a surrogate probability distribution over rankings based on a sorting procedure, for which the pairwise marginals provably coincide with the marginals of the Plackett-Luce distribution. In addition to a formal performance and complexity analysis, we present first experimental studies.

ICML Conference 2015 Conference Paper

Qualitative Multi-Armed Bandits: A Quantile-Based Approach

  • Balázs Szörényi
  • Róbert Busa-Fekete
  • Paul Weng
  • Eyke Hüllermeier

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

JMLR Journal 2014 Journal Article

On the Bayes-Optimality of F-Measure Maximizers

  • Willem Waegeman
  • Krzysztof Dembczyński
  • Arkadiusz Jachnik
  • Weiwei Cheng
  • Eyke Hüllermeier

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

AAAI Conference 2014 Conference Paper

PAC Rank Elicitation through Adaptive Sampling of Stochastic Pairwise Preferences

  • Róbert Busa-Fekete
  • Balázs Szörényi
  • Eyke Hüllermeier

We introduce the problem of PAC rank elicitation, which consists of sorting a given set of options based on adaptive sampling of stochastic pairwise preferences. More specifically, we assume the existence of a ranking procedure, such as Copeland’s method, that determines an underlying target order of the options. The goal is to predict a ranking that is sufficiently close to this target order with high probability, where closeness is measured in terms of a suitable distance measure. We instantiate this setting with combinations of two different distance measures and ranking procedures. For these instantiations, we devise efficient strategies for sampling pairwise preferences and analyze the corresponding sample complexity. We also present first experiments to illustrate the practical performance of our methods.

ICML Conference 2014 Conference Paper

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

  • Róbert Busa-Fekete
  • Eyke Hüllermeier
  • Balázs Szörényi

We address the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings (total orders) of a given set of items. Instead of asking for complete rankings, however, our learner is only allowed to query pairwise preferences. Using information of that kind, the goal of the learner is to reliably predict properties of the distribution, such as the most probable top-item, the most probable ranking, or the distribution itself. More specifically, learning is done in an online manner, and the goal is to minimize sample complexity while guaranteeing a certain level of confidence.

ICML Conference 2013 Conference Paper

Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization

  • Krzysztof Dembczynski
  • Arkadiusz Jachnik
  • Wojciech Kotlowski
  • Willem Waegeman
  • Eyke Hüllermeier

We compare the plug-in rule approach for optimizing the F-measure in multi-label classification with an approach based on structured loss minimization, such as the structured support vector machine (SSVM). Whereas the former derives an optimal prediction from a probabilistic model in a separate inference step, the latter seeks to optimize the F-measure directly during the training phase. We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency. As a main theoretical result, we show that our plug-in rule algorithm is consistent, whereas the SSVM approaches are not. Finally, we present results of a large experimental study showing the benefits of the introduced algorithm.

IJCAI Conference 2013 Conference Paper

Preference-Based CBR: General Ideas and Basic Principles

  • Eyke Hüllermeier
  • Weiwei Cheng

Building on recent research on preference handling in artificial intelligence and related fields, our goal is to develop a coherent and generic methodological framework for case-based reasoning (CBR) on the basis of formal concepts and methods for knowledge representation and reasoning with preferences. A preference-based approach to CBR appears to be appealing for several reasons, notably because case-based experiences naturally lend themselves to representations in terms of preference or order relations. Moreover, the flexibility and expressiveness of a preference-based formalism well accommodate the uncertain and approximate nature of case-based problem solving. In this paper, we outline the basic ideas of preferencebased CBR and sketch a formal framework for realizing these ideas.

ICML Conference 2013 Conference Paper

Top-k Selection based on Adaptive Sampling of Noisy Preferences

  • Róbert Busa-Fekete
  • Balázs Szörényi
  • Weiwei Cheng
  • Paul Weng
  • Eyke Hüllermeier

We consider the problem of reliably selecting an optimal subset of fixed size from a given set of choice alternatives, based on noisy information about the quality of these alternatives. Problems of similar kind have been tackled by means of adaptive sampling schemes called racing algorithms. However, in contrast to existing approaches, we do not assume that each alternative is characterized by a real-valued random variable, and that samples are taken from the corresponding distributions. Instead, we only assume that alternatives can be compared in terms of pairwise preferences. We propose and formally analyze a general preference-based racing algorithm that we instantiate with three specific ranking procedures and corresponding sampling schemes. Experiments with real and synthetic data are presented to show the efficiency of our approach.

ECAI Conference 2012 Conference Paper

An Analysis of Chaining in Multi-Label Classification

  • Krzysztof Dembczynski
  • Willem Waegeman
  • Eyke Hüllermeier

The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minimization perspective, thereby helping to gain a better understanding of this approach. As a main result, we clarify that the original chaining method seeks to approximate the joint mode of the conditional distribution of label vectors in a greedy manner. As a result of a theoretical regret analysis, we conclude that this approach can perform quite poorly in terms of subset 0/1 loss. Therefore, we present an enhanced inference procedure for which the worst-case regret can be upper-bounded far more tightly. In addition, we show that a probabilistic variant of chaining, which can be utilized for any loss function, becomes tractable by using Monte Carlo sampling. Finally, we present experimental results confirming the validity of our theoretical findings.

NeurIPS Conference 2012 Conference Paper

Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

  • Weiwei Cheng
  • Eyke Hüllermeier
  • Willem Waegeman
  • Volkmar Welker

Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach.

NeurIPS Conference 2011 Conference Paper

An Exact Algorithm for F-Measure Maximization

  • Krzysztof Dembczynski
  • Willem Waegeman
  • Weiwei Cheng
  • Eyke Hüllermeier

The F-measure, originally introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure remains a statistically and computationally challenging problem, since no closed-form maximizer exists. Current algorithms are approximate and typically rely on additional assumptions regarding the statistical distribution of the binary response variables. In this paper, we present an algorithm which is not only computationally efficient but also exact, regardless of the underlying distribution. The algorithm requires only a quadratic number of parameters of the joint distribution (with respect to the number of binary responses). We illustrate its practical performance by means of experimental results for multi-label classification.

ECAI Conference 2006 Conference Paper

A Unified Model for Multilabel Classification and Ranking

  • Klaus Brinker
  • Johannes Fürnkranz
  • Eyke Hüllermeier

Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. We present empirical results in the area of text categorization and gene analysis, underscoring the merits of the calibrated model in comparison to state-of-the-art multilabel learning methods.

IJCAI Conference 2005 Conference Paper

Cho-k-NN: A Method for Combining Interacting Pieces of Evidence in Case-Based Learning

  • Eyke Hüllermeier

The case-based learning paradigm relies upon memorizing cases in the form of successful problem solving experience, such as e. g. a pattern along with its classification in pattern recognition or a problem along with a solution in case-based reasoning. When it comes to solving a new problem, each of these cases serves as an individual piece of evidence that gives an indication of the solution to that problem. In this paper, we elaborate on issues concerning the proper combination (aggregation) of such pieces of evidence. Particularly, we argue that cases retrieved from a case library must not be considered as independent information sources, as implicitly done by most case-based learning methods. Focusing on the problem of prediction as a performance task, we propose a new inference principle that combines potentially interacting pieces of evidence by means of the so-called (discrete) Choquet-integral. Our method, called Cho-k-NN, takes interdependencies between the stored cases into account and can be seen as a generalization of weighted nearest neighbor estimation.

AAAI Conference 2000 Conference Paper

Change Detection in Heuristic Search

  • Eyke Hüllermeier

The order in which nodes are explored in a (depth-first) iterative deepening search strategy is principally determined by the condition under which a path of the search tree is cut off in each search phase. A corresponding criterion, which has a strong influence on the performance of the overall (heuristic) search procedure, is generally realized in the form of an upper cost bound. In this paper, we develop an effective and computationally efficient termination criterion based on statistical methods of change detection. The criterion is local in the sense that it depends on properties of a path itself, rather than on the comparison with other paths. Loosely speaking, the idea is to take a systematic change in the (heuristic) evaluation of nodes along a search path as an indication of suboptimality. An expected utility criterion which also takes the consequence of the suboptimal search decision on the solution quality into account is proposed as a generalization of this idea.