Arrow Research search

Author name cluster

Marius Kloft

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers
2 author rows

Possible papers

43

AAAI Conference 2026 Conference Paper

Reimagining Anomalies: What If Anomalies Were Normal?

  • Philipp Liznerski
  • Saurabh Varshneya
  • Ece Calikus
  • Puyu Wang
  • Alexander Bartscher
  • Sebastian Josef Vollmer
  • Sophie Fellenz
  • Marius Kloft

Deep learning-based methods have achieved a breakthrough in image anomaly detection, but their complexity introduces a considerable challenge to understanding why an instance is predicted to be anomalous. We introduce a novel explanation method that generates multiple alternative modifications for each anomaly, capturing diverse concepts of anomalousness. Each modification is trained to be perceived as normal by the anomaly detector. The method provides a semantic explanation of the mechanism that triggered the detector, allowing users to explore ``what-if scenarios.'' Qualitative and quantitative analyses across various image datasets demonstrate that applying this method to state-of-the-art detectors provides high-quality semantic explanations.

AAAI Conference 2026 Conference Paper

TORA: Train Once, Realign Anytime for Offline Multi-Objective Reinforcement Learning

  • Weichen Li
  • Waleed Mustafa
  • Marcio Monteiro
  • Puyu Wang
  • Marius Kloft
  • Sophie Fellenz

Intelligent agents in real-world applications must adapt their behavior to changing contexts and user preferences. For example, planning a road trip requires considering both travel time and cost. Multi-objective reinforcement learning (MORL) provides a principled approach to navigate such trade-offs. However, most existing approaches require predefined preference weights during training and jointly optimize the model for all objectives. In this paper, we introduce TORA (Train Once, Realign Anytime), a novel framework that defers preference integration to inference time, enabling flexible adaptation to user preferences without retraining. TORA independently trains diffusion planning models for each objective and combines them at inference time using user-specified preferences to generate behavior aligned with desired trade-offs. Furthermore, new objectives can be added seamlessly by training additional models without modifying existing ones. Empirical evaluations on standard offline MORL benchmarks demonstrate that TORA achieves competitive and consistent performance compared to methods that require fixed preference weights.

TMLR Journal 2025 Journal Article

Explaining Bayesian Neural Networks

  • Kirill Bykov
  • Marina MC Höhne
  • Adelaida Creosteanu
  • Klaus Robert Muller
  • Frederick Klauschen
  • Shinichi Nakajima
  • Marius Kloft

To advance the transparency of learning machines such as Deep Neural Networks (DNNs), the field of Explainable AI (XAI) was established to provide interpretations of DNNs' predictions. While different explanation techniques exist, a popular approach is given in the form of attribution maps, which illustrate, given a particular data point, the relevant patterns the model has used for making its prediction. Although Bayesian models such as Bayesian Neural Networks (BNNs) have a limited form of transparency built-in through their prior weight distribution, they lack explanations of their predictions for given instances. In this work, we take a step toward combining these two perspectives by examining how local attributions can be extended to BNNs. Within the Bayesian framework, network weights follow a probability distribution; hence, the standard point explanation extends naturally to an explanation distribution. Viewing explanations probabilistically, we aggregate and analyze multiple local attributions drawn from an approximate posterior to explore variability in explanation patterns. The diversity of explanations offers a way to further explore how predictive rationales may vary across posterior samples. Quantitative and qualitative experiments on toy and benchmark data, as well as on a real-world pathology dataset, illustrate that our framework enriches standard explanations with uncertainty information and may support the visualization of explanation stability.

NeurIPS Conference 2025 Conference Paper

Mitigating Spurious Features in Contrastive Learning with Spectral Regularization

  • Naghmeh Ghanooni
  • Waleed Mustafa
  • Dennis Wagner
  • Sophie Fellenz
  • Anthony Lin
  • Marius Kloft

Neural networks generally prefer simple and easy-to-learn features. When these features are spuriously correlated with the labels, the network's performance can suffer, particularly for underrepresented classes or concepts. Self-supervised representation learning methods, such as contrastive learning, are especially prone to this issue, often resulting in worse performance on downstream tasks. We identify a key spectral signature of this failure: early reliance on dominant singular modes of the learned feature matrix. To mitigate this, we propose a novel framework that promotes a uniform eigenspectrum of the feature covariance matrix, encouraging diverse and semantically rich representations. Our method operates in a fully self-supervised setting, without relying on ground-truth labels or any additional information. Empirical results on SimCLR and SimSiam demonstrate consistent gains in robustness and transfer performance, suggesting broad applicability across self-supervised learning paradigms. Code: https: //github. com/NaghmehGh/SpuriousCorrelation_SSRL

NeurIPS Conference 2025 Conference Paper

NoBOOM: Chemical Process Datasets for Industrial Anomaly Detection

  • Dennis Wagner
  • Fabian Hartung
  • Justus Arweiler
  • Aparna Muraleedharan
  • Indra Jungjohann
  • Arjun Nair
  • Steffen Reithermann
  • Ralf Schulz

Monitoring chemical processes is essential to prevent catastrophic failures, optimize costs and profits, and ensure the safety of employees and the environment. A key component of modern monitoring systems is the automated detection of anomalies in sensor data over time, called time series, enabling partial automation of plant operation and adding additional layers of supervision to crucial components. The development of anomaly detection methods in this domain is challenging, since real chemical process data are usually proprietary, and simulated data are generally not a sufficient replacement. In this paper, we present NoBOOM, the first collection of datasets for anomaly detection in real-world chemical process data, including labeled data from a running process at our industry partner BASF SE — one of the world’s leading chemical companies — and several chemical processes run in laboratory‑scale and pilot‑scale plants. While we are not able to share every detail about the industrial process, for the laboratory‑ and pilot‑scale plants, we provide comprehensive information on plant configuration, process operation, and, in particular, anomaly events, enabling a differentiated analysis of anomaly detection methods. To demonstrate the complexity of the benchmark, we analyze the data with regard to common issues of time-series anomaly detection (TSAD) benchmarks, including potential triviality and bias.

TMLR Journal 2025 Journal Article

On the Challenges and Opportunities in Generative AI

  • Laura Manduchi
  • Clara Meister
  • Kushagra Pandey
  • Robert Bamler
  • Ryan Cotterell
  • Sina Däubener
  • Sophie Fellenz
  • Asja Fischer

The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains. In this work, our objective is to identify these issues and highlight key unresolved challenges in modern generative AI paradigms that should be addressed to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.

TMLR Journal 2025 Journal Article

Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings

  • Billy Joe Franks
  • Moshe Eliasof
  • Semih Cantürk
  • Guy Wolf
  • Carola-Bibiane Schönlieb
  • Sophie Fellenz
  • Marius Kloft

Recent advances in integrating positional and structural encodings (PSEs) into graph neural networks (GNNs) have significantly enhanced their performance across various graph learning tasks. However, the general applicability of these encodings and their potential to serve as foundational representations for graphs remain uncertain. This paper investigates the fine-tuning efficiency, scalability with sample size, and generalization capability of learnable PSEs across diverse graph datasets. Specifically, we evaluate their potential as universal pre-trained models that can be easily adapted to new tasks with minimal fine-tuning and limited data. Furthermore, we assess the expressivity of the learned representations, particularly, when used to augment downstream GNNs. We demonstrate through extensive benchmarking and empirical analysis that PSEs generally enhance downstream models. However, some datasets may require specific PSE-augmentations to achieve optimal performance. Nevertheless, our findings highlight their significant potential to become integral components of future graph foundation models. We provide new insights into the strengths and limitations of PSEs, contributing to the broader discourse on foundation models in graph learning.

TMLR Journal 2024 Journal Article

Generalization Bounds with Logarithmic Negative-Sample Dependence for Adversarial Contrastive Learning

  • Naghmeh Ghanooni
  • Waleed Mustafa
  • Yunwen Lei
  • Anthony Widjaja Lin
  • Marius Kloft

Contrastive learning has emerged as a powerful unsupervised learning technique for extracting meaningful representations from unlabeled data by pulling similar data points closer in the representation space and pushing dissimilar ones apart. However, its vulnerability to adversarial attacks remains a critical challenge. To address this, adversarial contrastive learning — incorporating adversarial training into contrastive loss — has emerged as a promising approach to achieving robust representations that can withstand various adversarial attacks. While empirical evidence highlights its effectiveness, a comprehensive theoretical framework has been lacking. In this paper, we fill this gap by introducing generalization bounds for adversarial contrastive learning, offering key theoretical insights. Leveraging the Lipschitz continuity of loss functions, we derive generalization bounds that scale logarithmically with the number of negative samples, $K$, and apply to both linear and non-linear representations, including those obtained from deep neural networks (DNNs). Our theoretical results are supported by experiments on real-world datasets.

IJCAI Conference 2024 Conference Paper

Interpretable Tensor Fusion

  • Saurabh Varshneya
  • Antoine Ledent
  • Philipp Liznerski
  • Andriy Balinskyy
  • Purvanshi Mehta
  • Waleed Mustafa
  • Marius Kloft

Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method training a neural network to simultaneously learn multiple data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of the data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations, respectively. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on four real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.

TMLR Journal 2023 Journal Article

A Systematic Approach to Universal Random Features in Graph Neural Networks

  • Billy Joe Franks
  • Markus Anders
  • Marius Kloft
  • Pascal Schweitzer

Universal random features (URF) are state of the art regarding practical graph neural networks that are provably universal. There is great diversity regarding terminology, methodology, benchmarks, and evaluation metrics used among existing URF. Not only does this make it increasingly difficult for practitioners to decide which technique to apply to a given problem, but it also stands in the way of systematic improvements. We propose a new comprehensive framework that captures all previous URF techniques. On the theoretical side, among other results, we formally prove that under natural conditions all instantiations of our framework are universal. The framework thus provides a new simple technique to prove universality results. On the practical side, we develop a method to systematically and automatically train URF. This in turn enables us to impartially and objectively compare all existing URF. New URF naturally emerge from our approach, and our experiments demonstrate that they improve the state of the art.

ICML Conference 2023 Conference Paper

Deep Anomaly Detection under Labeling Budget Constraints

  • Aodong Li
  • Chen Qiu 0001
  • Marius Kloft
  • Padhraic Smyth
  • Stephan Mandt
  • Maja Rudolph

Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.

AAAI Conference 2023 Conference Paper

Generalization Bounds for Inductive Matrix Completion in Low-Noise Settings

  • Antoine Ledent
  • Rodrigo Alves
  • Yunwen Lei
  • Yann Guermeur
  • Marius Kloft

We study inductive matrix completion (matrix completion with side information) under an i.i.d. subgaussian noise assumption at a low noise regime, with uniform sampling of the entries. We obtain for the first time generalization bounds with the following three properties: (1) they scale like the standard deviation of the noise and in particular approach zero in the exact recovery case; (2) even in the presence of noise, they converge to zero when the sample size approaches infinity; and (3) for a fixed dimension of the side information, they only have a logarithmic dependence on the size of the matrix. Differently from many works in approximate recovery, we present results both for bounded Lipschitz losses and for the absolute loss, with the latter relying on Talagrand-type inequalities. The proofs create a bridge between two approaches to the theoretical analysis of matrix completion, since they consist in a combination of techniques from both the exact recovery literature and the approximate recovery literature.

NeurIPS Conference 2023 Conference Paper

Labeling Neural Representations with Inverse Recognition

  • Kirill Bykov
  • Laura Kopf
  • Shinichi Nakajima
  • Marius Kloft
  • Marina Höhne

Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding explanation and delivering a measure of statistical significance. We demonstrate the applicability of INVERT in various scenarios, including the identification of representations affected by spurious correlations, and the interpretation of the hierarchical structure of decision-making within the models.

TMLR Journal 2023 Journal Article

TimeSeAD: Benchmarking Deep Multivariate Time-Series Anomaly Detection

  • Dennis Wagner
  • Tobias Michels
  • Florian C.F. Schulz
  • Arjun Nair
  • Maja Rudolph
  • Marius Kloft

Developing new methods for detecting anomalies in time series is of great practical significance, but progress is hindered by the difficulty of assessing the benefit of new methods, for the following reasons. (1) Public benchmarks are flawed (e.g., due to potentially erroneous anomaly labels), (2) there is no widely accepted standard evaluation metric, and (3) evaluation protocols are mostly inconsistent. In this work, we address all three issues: (1) We critically analyze several of the most widely-used multivariate datasets, identify a number of significant issues, and select the best candidates for evaluation. (2) We introduce a new evaluation metric for time-series anomaly detection, which—in contrast to previous metrics—is recall consistent and takes temporal correlations into account. (3) We analyze and overhaul existing evaluation protocols and provide the largest benchmark of deep multivariate time-series anomaly detection methods to date. We focus on deep-learning based methods and multivariate data, a common setting in modern anomaly detection. We provide all implementations and analysis tools in a new comprehensive library for Time Series Anomaly Detection, called TimeSeAD.

ICML Conference 2023 Conference Paper

Training Normalizing Flows from Dependent Data

  • Matthias Kirchler
  • Christoph Lippert
  • Marius Kloft

Normalizing flows are powerful non-parametric statistical models that function as a hybrid between density estimators and generative models. Current learning algorithms for normalizing flows assume that data points are sampled independently, an assumption that is frequently violated in practice, which may lead to erroneous density estimation and data generation. We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures. We show that respecting dependencies between observations can improve empirical results on both synthetic and real-world data, and leads to higher statistical power in a downstream application to genome-wide association studies.

NeurIPS Conference 2023 Conference Paper

Zero-Shot Anomaly Detection via Batch Normalization

  • Aodong Li
  • Chen Qiu
  • Marius Kloft
  • Padhraic Smyth
  • Maja Rudolph
  • Stephan Mandt

Anomaly detection (AD) plays a crucial role in many safety-critical application domains. The challenge of adapting an anomaly detector to drift in the normal data distribution, especially when no training data is available for the "new normal, " has led to the development of zero-shot AD techniques. In this paper, we propose a simple yet effective method called Adaptive Centered Representations (ACR) for zero-shot batch-level AD. Our approach trains off-the-shelf deep anomaly detectors (such as deep SVDD) to adapt to a set of inter-related training data distributions in combination with batch normalization, enabling automatic zero-shot generalization for unseen AD tasks. This simple recipe, batch normalization plus meta-training, is a highly effective and versatile tool. Our results demonstrate the first zero-shot AD results for tabular data and outperform existing methods in zero-shot anomaly detection and segmentation on image data from specialized domains.

TMLR Journal 2022 Journal Article

Exposing Outlier Exposure: What Can Be Learned From Few, One, and Zero Outlier Images

  • Philipp Liznerski
  • Lukas Ruff
  • Robert A. Vandermeulen
  • Billy Joe Franks
  • Klaus Robert Muller
  • Marius Kloft

Due to the intractability of characterizing everything that looks unlike the normal data, anomaly detection (AD) is traditionally treated as an unsupervised problem utilizing only normal samples. However, it has recently been found that unsupervised image AD can be drastically improved through the utilization of huge corpora of random images to represent anomalousness; a technique which is known as Outlier Exposure. In this paper we show that specialized AD learning methods seem unnecessary for state-of-the-art performance, and furthermore one can achieve strong performance with just a small collection of Outlier Exposure data, contradicting common assumptions in the field of AD. We find that standard classifiers and semi-supervised one-class methods trained to discern between normal samples and relatively few random natural images are able to outperform the current state of the art on an established AD benchmark with ImageNet. Further experiments reveal that even one well-chosen outlier sample is sufficient to achieve decent performance on this benchmark (79.3% AUC). We investigate this phenomenon and find that one-class methods are more robust to the choice of training outliers, indicating that there are scenarios where these are still more useful than standard classifiers. Additionally, we include experiments that delineate the scenarios where our results hold. Lastly, no training samples are necessary when one uses the representations learned by CLIP, a recent foundation model, which achieves state-of-the-art AD results on CIFAR-10 and ImageNet in a zero-shot setting.

ICML Conference 2022 Conference Paper

Latent Outlier Exposure for Anomaly Detection with Contaminated Data

  • Chen Qiu 0001
  • Aodong Li
  • Marius Kloft
  • Maja Rudolph
  • Stephan Mandt

Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by outlier exposure (Hendrycks et al. , 2018) that considers synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our experiments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.

ICML Conference 2022 Conference Paper

On the Generalization Analysis of Adversarial Learning

  • Waleed Mustafa
  • Yunwen Lei
  • Marius Kloft

Many recent studies have highlighted the susceptibility of virtually all machine-learning models to adversarial attacks. Adversarial attacks are imperceptible changes to an input example of a given prediction model. Such changes are carefully designed to alter the otherwise correct prediction of the model. In this paper, we study the generalization properties of adversarial learning. In particular, we derive high-probability generalization bounds on the adversarial risk in terms of the empirical adversarial risk, the complexity of the function class and the adversarial noise set. Our bounds are generally applicable to many models, losses, and adversaries. We showcase its applicability by deriving adversarial generalization bounds for the multi-class classification setting and various prediction models (including linear models and Deep Neural Networks). We also derive optimistic adversarial generalization bounds for the case of smooth losses. These are the first fast-rate bounds valid for adversarial deep learning to the best of our knowledge.

IJCAI Conference 2022 Conference Paper

Raising the Bar in Graph-level Anomaly Detection

  • Chen Qiu
  • Marius Kloft
  • Stephan Mandt
  • Maja Rudolph

Graph-level anomaly detection has become a critical topic in diverse areas, such as financial fraud detection and detecting anomalous activities in social networks. While most research has focused on anomaly detection for visual data such as images, where high detection accuracies have been obtained, existing deep learning approaches for graphs currently show considerably worse performance. This paper raises the bar on graph-level anomaly detection, i. e. , the task of detecting abnormal graphs in a set of graphs. By drawing on ideas from self-supervised learning and transformation learning, we present a new deep learning approach that significantly improves existing deep one-class approaches by fixing some of their known problems, including hypersphere collapse and performance flip. Experiments on nine real-world data sets involving nine techniques reveal that our method achieves an average performance improvement of 11. 8% AUC compared to the best existing approach.

ICLR Conference 2021 Conference Paper

Explainable Deep One-Class Classification

  • Philipp Liznerski
  • Lukas Ruff
  • Robert A. Vandermeulen
  • Billy Joe Franks
  • Marius Kloft
  • Klaus-Robert Müller

Deep one-class classification variants for anomaly detection learn a mapping that concentrates nominal samples in feature space causing anomalies to be mapped away. Because this transformation is highly non-linear, finding interpretations poses a significant challenge. In this paper we present an explainable deep one-class classification method, Fully Convolutional Data Description (FCDD), where the mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a recent manufacturing dataset offering ground-truth anomaly maps, FCDD sets a new state of the art in the unsupervised setting. Our method can incorporate ground-truth anomaly maps during training and using even a few of these (~5) improves performance significantly. Finally, using FCDD's explanations we demonstrate the vulnerability of deep one-class classification models to spurious image features such as image watermarks.

NeurIPS Conference 2021 Conference Paper

Fine-grained Generalization Analysis of Inductive Matrix Completion

  • Antoine Ledent
  • Rodrigo Alves
  • Yunwen Lei
  • Marius Kloft

In this paper, we bridge the gap between the state-of-the-art theoretical results for matrix completion with the nuclear norm and their equivalent in \textit{inductive matrix completion}: (1) In the distribution-free setting, we prove bounds improving the previously best scaling of $O(rd^2)$ to $\widetilde{O}(d^{3/2}\sqrt{r})$, where $d$ is the dimension of the side information and $r$ is the rank. (2) We introduce the (smoothed) \textit{adjusted trace-norm minimization} strategy, an inductive analogue of the weighted trace norm, for which we show guarantees of the order $\widetilde{O}(dr)$ under arbitrary sampling. In the inductive case, a similar rate was previously achieved only under uniform sampling and for exact recovery. Both our results align with the state of the art in the particular case of standard (non-inductive) matrix completion, where they are known to be tight up to log terms. Experiments further confirm that our strategy outperforms standard inductive matrix completion on various synthetic datasets and real problems, justifying its place as an important tool in the arsenal of methods for matrix completion using side information.

IJCAI Conference 2021 Conference Paper

Fine-grained Generalization Analysis of Structured Output Prediction

  • Waleed Mustafa
  • Yunwen Lei
  • Antoine Ledent
  • Marius Kloft

In machine learning we often encounter structured output prediction problems (SOPPs), i. e. problems where the output space admits a rich internal structure. Application domains where SOPPs naturally occur include natural language processing, speech recognition, and computer vision. Typical SOPPs have an extremely large label set, which grows exponentially as a function of the size of the output. Existing generalization analysis implies generalization bounds with at least a square-root dependency on the cardinality d of the label set, which can be vacuous in practice. In this paper, we significantly improve the state of the art by developing novel high-probability bounds with a logarithmic dependency on d. Furthermore, we leverage the lens of algorithmic stability to develop generalization bounds in expectation without any dependency on d. Our results therefore build a solid theoretical foundation for learning in large-scale SOPPs. Furthermore, we extend our results to learning with weakly dependent data.

AAAI Conference 2021 Conference Paper

Fine-grained Generalization Analysis of Vector-Valued Learning

  • Liang Wu
  • Antoine Ledent
  • Yunwen Lei
  • Marius Kloft

Many fundamental machine learning tasks can be formulated as a problem of learning with vector-valued functions, where we learn multiple scalar-valued functions together. Although there is some generalization analysis on different specific algorithms under the empirical risk minimization principle, a unifying analysis of vector-valued learning under a regularization framework is still lacking. In this paper, we initiate the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size. Our discussions relax the existing assumptions on the restrictive constraint of hypothesis spaces, smoothness of loss functions and low-noise condition. To understand the interaction between optimization and learning, we further use our results to derive the first generalization bounds for stochastic gradient descent with vector-valued functions. We apply our general results to multi-class classification and multi-label classification, which yield the first bounds with a logarithmic dependency on the output dimension for extreme multi-label classification with the Frobenius regularization. As a byproduct, we derive a Rademacher complexity bound for loss function classes defined in terms of a general strongly convex function.

IJCAI Conference 2021 Conference Paper

Learning Interpretable Concept Groups in CNNs

  • Saurabh Varshneya
  • Antoine Ledent
  • Robert A. Vandermeulen
  • Yunwen Lei
  • Matthias Enders
  • Damian Borth
  • Marius Kloft

We propose a novel training methodology---Concept Group Learning (CGL)---that encourages training of interpretable CNN filters by partitioning filters in each layer into \emph{concept groups}, each of which is trained to learn a single visual concept. We achieve this through a novel regularization strategy that forces filters in the same group to be active in similar image regions for a given layer. We additionally use a regularizer to encourage a sparse weighting of the concept groups in each layer so that a few concept groups can have greater importance than others. We quantitatively evaluate CGL's model interpretability using standard interpretability evaluation techniques and find that our method increases interpretability scores in most cases. Qualitatively we compare the image regions which are most active under filters learned using CGL versus filters learned without CGL and find that CGL activation regions more strongly concentrate around semantically relevant features.

AAAI Conference 2021 Conference Paper

Model Uncertainty Guides Visual Object Tracking

  • Lijun Zhou
  • Antoine Ledent
  • Qintao Hu
  • Ting Liu
  • Jianlin Zhang
  • Marius Kloft

Model object trackers largely rely on the online learning of a discriminative classifier from potentially diverse sample frames. However, noisy or insufficient amounts of samples can deteriorate the classifiers’ performance and cause tracking drift. Furthermore, alterations such as occlusion and blurring can cause the target to be lost. In this paper, we make several improvements aimed at tackling uncertainty and improving robustness in object tracking. Our first and most important contribution is to propose a sampling method for the online learning of object trackers based on uncertainty adjustment: our method effectively selects representative sample frames to feed the discriminative branch of the tracker, while filtering out noise samples. Furthermore, to improve the robustness of the tracker to various challenging scenarios, we propose a novel data augmentation procedure, together with a specific improved backbone architecture. All our improvements fit together in one model, which we refer to as the Uncertainty Adjusted Tracker (UATracker), and can be trained in a joint and end-to-end fashion. Experiments on the LaSOT, UAV123, OTB100 and VOT2018 benchmarks demonstrate that our UATracker outperforms state-of-the-art real-time trackers by significant margins. 1

ICML Conference 2021 Conference Paper

Neural Transformation Learning for Deep Anomaly Detection Beyond Images

  • Chen Qiu 0001
  • Timo Pfrommer
  • Marius Kloft
  • Stephan Mandt
  • Maja Rudolph

Data transformations (e. g. rotations, reflections, and cropping) play an important role in self-supervised learning. Typically, images are transformed into different views, and neural networks trained on tasks involving these views produce useful feature representations for downstream tasks, including anomaly detection. However, for anomaly detection beyond image data, it is often unclear which transformations to use. Here we present a simple end-to-end procedure for anomaly detection with learnable transformations. The key idea is to embed the transformed data into a semantic space such that the transformed data still resemble their untransformed form, while different transformations are easily distinguishable. Extensive experiments on time series show that our proposed method outperforms existing approaches in the one-vs. -rest setting and is competitive in the more challenging n-vs. -rest anomaly-detection task. On medical and cyber-security tabular data, our method learns domain-specific transformations and detects anomalies more accurately than previous work.

AAAI Conference 2021 Conference Paper

Norm-Based Generalisation Bounds for Deep Multi-Class Convolutional Neural Networks

  • Antoine Ledent
  • Waleed Mustafa
  • Yunwen Lei
  • Marius Kloft

We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the Frobenius-norm of the weight matrices, where previous bounds exhibit at least a squareroot dependence on the number of classes. (2) We adapt the classic Rademacher analysis of DNNs to incorporate weight sharing—a task of fundamental theoretical importance which was previously attempted only under very restrictive assumptions. In our results, each convolutional filter contributes only once to the bound, regardless of how many times it is applied. Further improvements exploiting pooling and sparse connections are provided. The presented bounds scale as the norms of the parameter matrices, rather than the number of parameters. In particular, contrary to bounds based on parameter counting, they are asymptotically tight (up to log factors) when the weights approach initialisation, making them suitable as a basic ingredient in bounds sensitive to the optimisation procedure. We also show how to adapt the recent technique of loss function augmentation to replace spectral norms by empirical analogues whilst maintaining the advantages of our approach.

ICLR Conference 2020 Conference Paper

Deep Semi-Supervised Anomaly Detection

  • Lukas Ruff
  • Robert A. Vandermeulen
  • Nico Görnitz
  • Alexander Binder
  • Emmanuel Müller
  • Klaus-Robert Müller
  • Marius Kloft

Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have---in addition to a large set of unlabeled samples---access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.

NeurIPS Conference 2020 Conference Paper

Sharper Generalization Bounds for Pairwise Learning

  • Yunwen Lei
  • Antoine Ledent
  • Marius Kloft

Pairwise learning refers to learning tasks with loss functions depending on a pair of training examples, which includes ranking and metric learning as specific examples. Recently, there has been an increasing amount of attention on the generalization analysis of pairwise learning to understand its practical behavior. However, the existing stability analysis provides suboptimal high-probability generalization bounds. In this paper, we provide a refined stability analysis by developing generalization bounds which can be $\sqrt{n}$-times faster than the existing results, where $n$ is the sample size. This implies excess risk bounds of the order $O(n^{-1/2})$ (up to a logarithmic factor) for both regularized risk minimization and stochastic gradient descent. We also introduce a new on-average stability measure to develop optimistic bounds in a low noise setting. We apply our results to ranking and metric learning, and clearly show the advantage of our generalization bounds over the existing analysis.

NeurIPS Conference 2019 Conference Paper

Effective End-to-end Unsupervised Outlier Detection via Inlier Priority of Discriminative Network

  • Siqi Wang
  • Yijie Zeng
  • Xinwang Liu
  • En Zhu
  • Jianping Yin
  • Chuanfu Xu
  • Marius Kloft

Despite the wide success of deep neural networks (DNN), little progress has been made on end-to-end unsupervised outlier detection (UOD) from high dimensional data like raw images. In this paper, we propose a framework named E^3Outlier, which can perform UOD in a both effective and end-to-end manner: First, instead of the commonly-used autoencoders in previous end-to-end UOD methods, E^3Outlier for the first time leverages a discriminative DNN for better representation learning, by using surrogate supervision to create multiple pseudo classes from original unlabelled data. Next, unlike classic UOD that utilizes data characteristics like density or proximity, we exploit a novel property named inlier priority to enable end-to-end UOD by discriminative DNN. We demonstrate theoretically and empirically that the intrinsic class imbalance of inliers/outliers will make the network prioritize minimizing inliers' loss when inliers/outliers are indiscriminately fed into the network for training, which enables us to differentiate outliers directly from DNN's outputs. Finally, based on inlier priority, we propose the negative entropy based score as a simple and effective outlierness measure. Extensive evaluations show that E^3Outlier significantly advances UOD performance by up to 30% AUROC against state-of-the-art counterparts, especially on relatively difficult benchmarks.

AAAI Conference 2019 Conference Paper

Efficient Gaussian Process Classification Using Pólya-Gamma Data Augmentation

  • Florian Wenzel
  • Théo Galy-Fajou
  • Christan Donner
  • Marius Kloft
  • Manfred Opper

We propose a scalable stochastic variational approach to GP classification building on Pólya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance.

ICML Conference 2018 Conference Paper

Deep One-Class Classification

  • Lukas Ruff
  • Nico Görnitz
  • Lucas Deecke
  • Shoaib Ahmed Siddiqui
  • Robert A. Vandermeulen
  • Alexander Binder
  • Emmanuel Müller
  • Marius Kloft

Despite the great advances made by deep learning in many machine learning problems, there is a relative dearth of deep learning approaches for anomaly detection. Those approaches which do exist involve networks trained to perform a task other than anomaly detection, namely generative models or compression, which are in turn adapted for use in anomaly detection; they are not trained on an anomaly detection based objective. In this paper we introduce a new anomaly detection method—Deep Support Vector Data Description—, which is trained on an anomaly detection based objective. The adaptation to the deep regime necessitates that our neural network and training procedure satisfy certain properties, which we demonstrate theoretically. We show the effectiveness of our method on MNIST and CIFAR-10 image benchmark datasets as well as on the detection of adversarial examples of GTSRB stop signs.

JMLR Journal 2018 Journal Article

Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

  • Niloofar Yousefi
  • Yunwen Lei
  • Marius Kloft
  • Mansooreh Mollaghasemi
  • Georgios C. Anagnostopoulos

We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), with which we establish sharp excess risk bounds for MTL in terms of the Local Rademacher Complexity (LRC). We also give a new bound on the (LRC) for any norm regularized hypothesis classes, which applies not only to MTL, but also to the standard Single-Task Learning (STL) setting. By combining both results, one can easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including-as we demonstrate-Schatten norm, group norm, and graph regularized MTL. The derived bounds reflect a relationship akin to a conservation law of asymptotic convergence rates. When compared to the rates obtained via a traditional, global Rademacher analysis, this very relationship allows for trading off slower rates with respect to the number of tasks for faster rates with respect to the number of available samples per task. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

ICML Conference 2015 Conference Paper

Hidden Markov Anomaly Detection

  • Nico Görnitz
  • Mikio L. Braun
  • Marius Kloft

We introduce a new anomaly detection methodology for data with latent dependency structure. As a particular instantiation, we derive a hidden Markov anomaly detector that extends the regular one-class support vector machine. We optimize the approach, which is non-convex, via a DC (difference of convex functions) algorithm, and show that the parameter v can be conveniently used to control the number of outliers in the model. The empirical evaluation on artificial and real data from the domains of computational biology and computational sustainability shows that the approach can achieve significantly higher anomaly detection performance than the regular one-class SVM.

NeurIPS Conference 2015 Conference Paper

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms

  • Yunwen Lei
  • Urun Dogan
  • Alexander Binder
  • Marius Kloft

This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis. The theoretical analysis motivates us to introduce a new multi-class classification machine based on lp-norm regularization, where the parameter p controls the complexity of the corresponding bounds. We derive an efficient optimization algorithm based on Fenchel duality theory. Benchmarks on several real-world datasets show that the proposed algorithm can achieve significant accuracy gains over the state of the art.

NeurIPS Conference 2013 Conference Paper

Learning Kernels Using Local Rademacher Complexity

  • Corinna Cortes
  • Marius Kloft
  • Mehryar Mohri

We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also report the results of experiments with both algorithms in both binary and multi-class classification tasks.

JMLR Journal 2012 Journal Article

On the Convergence Rate of l p -Norm Multiple Kernel Learning

  • Marius Kloft
  • Gilles Blanchard

We derive an upper bound on the local Rademacher complexity of l p -norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches analyzed the case p=1 only while our analysis covers all cases 1≤p≤∞, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order O(n -α/1+α ), where α is the minimum eigenvalue decay rate of the individual kernels. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2012 Journal Article

Security Analysis of Online Centroid Anomaly Detection

  • Marius Kloft
  • Pavel Laskov

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). In such cases, learning algorithms may have to cope with manipulated data aimed at hampering decision making. Although some previous work addressed the issue of handling malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution, we analyze the performance of a particular method---online centroid anomaly detection---in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, and analysis of attack efficiency and limitations. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly detection under different conditions: attacker's full or limited control over the traffic and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker's gain) if external constraints are properly used. Our experimental evaluation, carried out on real traces of HTTP and exploit traffic, confirms the tightness of our theoretical bounds and the practicality of our protection mechanisms. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2011 Journal Article

l p -Norm Multiple Kernel Learning

  • Marius Kloft
  • Ulf Brefeld
  • Sören Sonnenburg
  • Alexander Zien

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability and scalability. Unfortunately, this l 1 -norm MKL is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we extend MKL to arbitrary norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary norms, that is l p -norms with p ≥ 1. This interleaved optimization is much faster than the commonly used wrapper approaches, as demonstrated on several data sets. A theoretical analysis and an experiment on controlled artificial data shed light on the appropriateness of sparse, non-sparse and l ∞ -norm MKL in various scenarios. Importantly, empirical applications of l p -norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art. Data sets, source code to reproduce the experiments, implementations of the algorithms, and further information are available at http://doc.ml.tu-berlin.de/nonsparse_mkl/. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

NeurIPS Conference 2011 Conference Paper

The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning

  • Marius Kloft
  • Gilles Blanchard

We derive an upper bound on the local Rademacher complexity of Lp-norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches analyzed the case p=1 only while our analysis covers all cases $1\leq p\leq\infty$, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order $O(n^{-\frac{\alpha}{1+\alpha}})$, where $\alpha$ is the minimum eigenvalue decay rate of the individual kernels.

NeurIPS Conference 2009 Conference Paper

Efficient and Accurate Lp-Norm Multiple Kernel Learning

  • Marius Kloft
  • Ulf Brefeld
  • Pavel Laskov
  • Klaus-Robert Müller
  • Alexander Zien
  • Sören Sonnenburg

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations and hence support interpretability. Unfortunately, L1-norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary Lp-norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p>1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply Lp-norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art.