Author name cluster

Klemens Böhm

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

TMLR Journal 2025 Journal Article

Adversarial Subspace Generation for Outlier Detection in High-Dimensional Data

Jose Cribeiro-Ramallo
Federico Matteucci
Paul Enciu
Alexander Jenke
Vadim Arzamasov
Thorsten Strufe
Klemens Böhm

Outlier detection in high-dimensional tabular data is challenging since data is often distributed across multiple lower-dimensional subspaces—a phenomenon known as the Multiple Views effect (MV). This effect led to a large body of research focused on mining such subspaces, known as *subspace selection*. However, as the precise nature of the MV effect was not well understood, traditional methods had to rely on heuristic-driven search schemes that struggle to accurately capture the true structure of the data. Properly identifying these subspaces is critical for unsupervised tasks such as outlier detection or clustering, where misrepresenting the underlying data structure can hinder the performance. We introduce Myopic Subspace Theory (MST), a new theoretical framework that mathematically formulates the Multiple Views effect and writes subspace selection as a stochastic optimization problem. Based on MST, we introduce V-GAN, a generative method trained to solve such an optimization problem. This approach avoids any exhaustive search over the feature space while ensuring that the intrinsic data structure is preserved. Experiments on 42 real-world datasets show that using V-GAN subspaces to build ensemble methods leads to a significant increase in one-class classification performance—compared to existing subspace selection, feature selection, and embedding methods. Further experiments on synthetic data show that V-GAN identifies subspaces more accurately while scaling better than other relevant subspace selection methods. These results confirm the theoretical guarantees of our approach and also highlight its practical viability in high-dimensional settings.

PDF Details

EWRL Workshop 2025 Workshop Paper

Informed Asymmetric Actor-Critic: Theoretical Insights and Open Questions

Daniel Ebi
Gaspard Lambrechts
Damien Ernst
Klemens Böhm

Reinforcement learning in partially observable environments requires agents to make decisions under uncertainty, based on incomplete and noisy observations. Asymmetric actor-critic methods improve learning in these settings by exploiting privileged information available during training. Most existing approaches, however, assume full access to the true state. In this work, we present a novel asymmetric actor-critic formulation grounded in informed partially observable Markov decision processes, allowing the critic to leverage arbitrary privileged information without requiring full-state access. We show that the method preserves the policy gradient theorem and yields unbiased gradient estimates even when the critic conditions on privileged partial information. Furthermore, we provide a theoretical analysis of the informed asymmetric recurrent natural policy gradient algorithm derived from our informed asymmetric learning paradigm. Our findings challenge the assumption that full-state access is necessary for unbiased policy learning, motivating the need to develop well-defined criteria to quantify the informativeness of additional training signals and opening new directions for asymmetric reinforcement learning.

PDF

TMLR Journal 2025 Journal Article

Maximum Mean Discrepancy on Exponential Windows for Online Change Detection

Florian Kalinke
Marco Heyden
Georg Gntuni
Edouard Fouché
Klemens Böhm

Detecting changes is of fundamental importance when analyzing data streams and has many applications, e.g., in predictive maintenance, fraud detection, or medicine. A principled approach to detect changes is to compare the distributions of observations within the stream to each other via hypothesis testing. Maximum mean discrepancy (MMD), a (semi-)metric on the space of probability distributions, provides powerful non-parametric two-sample tests on kernel-enriched domains. In particular, MMD is able to detect any disparity between distributions under mild conditions. However, classical MMD estimators suffer from a quadratic runtime complexity, which renders their direct use for change detection in data streams impractical. In this article, we propose a new change detection algorithm, called Maximum Mean Discrepancy on Exponential Windows (MMDEW), that combines the benefits of MMD with an efficient computation based on exponential windows. We prove that MMDEW enjoys polylogarithmic runtime and logarithmic memory complexity and show empirically that it outperforms the state of the art on benchmark data streams.

PDF Details

TMLR Journal 2025 Journal Article

Partial-Label Learning with a Reject Option

Tobias Fuchs
Florian Kalinke
Klemens Böhm

In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they impact actions or decisions. We propose a novel risk-consistent nearest-neighbor-based partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions. When evaluated without the reject option, our nearest-neighbor-based approach also achieves competitive prediction performance.

PDF Details

NeurIPS Conference 2023 Conference Paper

A benchmark of categorical encoders for binary classification

Federico Matteucci
Vadim Arzamasov
Klemens Böhm

Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions~---~aspects disregarded in previous encoder benchmarks. Our code is available at \url{https: //github. com/DrCohomology/EncoderBenchmarking}.

PDF Details

SAT Conference 2022 Conference Paper

A Comprehensive Study of k-Portfolios of Recent SAT Solvers

Jakob Bach
Ashlin Iser
Klemens Böhm

Hard combinatorial problems such as propositional satisfiability are ubiquitous. The holy grail are solution methods that show good performance on all problem instances. However, new approaches emerge regularly, some of which are complementary to existing solvers in that they only run faster on some instances but not on many others. While portfolios, i. e. , sets of solvers, have been touted as useful, putting together such portfolios also needs to be efficient. In particular, it remains an open question how well portfolios can exploit the complementarity of solvers. This paper features a comprehensive analysis of portfolios of recent SAT solvers, the ones from the SAT Competitions 2020 and 2021. We determine optimal portfolios with exact and approximate approaches and study the impact of portfolio size k on performance. We also investigate how effective off-the-shelf prediction models are for instance-specific solver recommendations. One result is that the portfolios found with an approximate approach are as good as the optimal solution in practice. We also observe that marginal returns decrease very quickly with larger k, and our prediction models do not give way to better performance beyond very small portfolio sizes.

Details

ICML Conference 2014 Conference Paper

Multivariate Maximal Correlation Analysis

Hoang Vu Nguyen
Emmanuel Müller
Jilles Vreeken
Pavel Efros
Klemens Böhm

Correlation analysis is one of the key elements of statistics, and has various applications in data analysis. Whereas most existing measures can only detect pairwise correlations between two dimensions, modern analysis aims at detecting correlations in multi-dimensional spaces. We propose MAC, a novel multivariate correlation measure designed for discovering multi-dimensional patterns. It belongs to the powerful class of maximal correlation analysis, for which we propose a generalization to multivariate domains. We highlight the limitations of current methods in this class, and address these with MAC. Our experiments show that MAC outperforms existing solutions, is robust to noise, and discovers interesting and useful patterns.

Details