Author name cluster

Stefan Kramer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

1 author row

TMLR Journal 2026 Journal Article

Autofocus Retrieval: An Effective Pipeline for Multi-Hop Question Answering With Semi-Structured Knowledge

Derian Boer
Stephen Linus Roth
Stefan Kramer

In many real-world settings, machine learning models and interactive systems have access to both structured knowledge, e.g., knowledge graphs or tables, and unstructured content, e.g., natural language documents. Yet, most rely on either. Semi-Structured Knowledge Bases (SKBs) bridge this gap by linking unstructured content to nodes within structured data. In this work, we present Autofocus-Retriever (AF-Retriever), a modular framework for SKB-based, multi-hop question answering. It combines structural and textual retrieval through novel integration steps and optimizations, achieving the best zero- and one-shot results across all three STaRK QA benchmarks, which span diverse domains and evaluation metrics. AF-Retriever’s average first-hit rate surpasses the second-best method by 32.1%. Its performance is driven by (1) leveraging exchangeable large language models (LLMs) to extract entity attributes and relational constraints for both parsing and reranking the top-$k$ answers, (2) vector similarity search for ranking both extracted entities and final answers, (3) a novel incremental scope expansion procedure that prepares for the reranking on a configurable amount of suitable candidates that fulfill the given constraints the most, and (4) a hybrid retrieval strategy that reduces error susceptibility. In summary, while constantly adjusting the focus like an optical autofocus, AF-Retriever delivers a configurable amount of answer candidates in four constraint-driven retrieval steps, which are then supplemented and ranked through four additional processing steps. An ablation study and a detailed error analysis, including a comparison of three different LLM reranking strategies, provide component-level insights that are valuable for advancing the model and for enabling researchers and users to adapt, optimize, or extend its parts. The source code is publicly available at https://github.com/kramerlab/AF-Retriever.

PDF Details

AAAI Conference 2026 Conference Paper

The Tatort Test of Intelligence: Towards Narrative Comprehension as a Benchmark for AI

Stefan Kramer
Lennart Baur
Lars Reinhardt

We propose—somewhat tongue-in-cheek, yet with serious implications—a new test for artificial intelligence: the ability to watch a 90-minute episode of the long-running German crime drama Tatort, and to explain every relevant detail. This involves reconstructing the evolving social network of characters, identifying their beliefs, desires, and intentions, and, crucially, determining who committed the crime. We argue that this task integrates narrative understanding, common-sense reasoning, social cognition, and theory of mind—and thus provides a uniquely challenging benchmark for AI.

PDF Details DOI

TCS Journal 2025 Journal Article

Optimizing resource allocation: An active learning approach to iterative combinatorial auctions

Benjamin Estermann
Stefan Kramer
Roger Wattenhofer
Kanye Ye Wang

In deep learning-based iterative combinatorial auctions (DL-ICA), bidders are not required to report valuations for all bundles upfront. Instead, DL-ICA iteratively requests bidders to report their values for specific bundles and determines item allocation using a winner determination problem, with bidder profiles modeled by neural networks. However, due to the limited number of reported bundles, DL-ICA may not always achieve optimal winner allocation, leading to reduced economic efficiency. In this work, we enhance the economic efficiency, specifically the social welfare, of DL-ICA by optimizing the underlying machine learning-based elicitation algorithm. We introduce two novel active learning-based initial sampling strategies: GALI and GALO. GALI ensures optimal coverage of the entire bundle space during sampling, while GALO identifies bundles with high diversity in bidders' estimated values as determined by the neural network. This approach extends the application of active learning beyond small pool sizes. We demonstrate how linear programs can be utilized for active learning to manage pool sizes exceeding 1030 samples. Our approach is theoretically validated and experimentally verified, showcasing significant improvements in performance.

Details DOI

AAAI Conference 2024 Conference Paper

Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations

Cedric Derstroff
Mattia Cerrato
Jannis Brugger
Jan Peters
Stefan Kramer

Peer learning is a novel high-level reinforcement learning framework for agents learning in groups. While standard reinforcement learning trains an individual agent in trial-and-error fashion, all on its own, peer learning addresses a related setting in which a group of agents, i.e., peers, learns to master a task simultaneously together from scratch. Peers are allowed to communicate only about their own states and actions recommended by others: "What would you do in my situation?". Our motivation is to study the learning behavior of these agents. We formalize the teacher selection process in the action advice setting as a multi-armed bandit problem and therefore highlight the need for exploration. Eventually, we analyze the learning behavior of the peers and observe their ability to rank the agents' performance within the study group and understand which agents give reliable advice. Further, we compare peer learning with single agent learning and a state-of-the-art action advice baseline. We show that peer learning is able to outperform single-agent learning and the baseline in several challenging discrete and continuous OpenAI Gym domains. Doing so, we also show that within such a framework complex policies from action recommendations beyond discrete action spaces can evolve.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Deep Learning-Powered Iterative Combinatorial Auctions with Active Learning

Benjamin Estermann
Stefan Kramer
Roger Wattenhofer
Ye Wang

Deep learning-powered iterative combinatorial auctions (DL-ICA) are auctions that utilize machine learning techniques. Unlike traditional auctions, bidders in DL-ICA do not need to report the valuations for all bundles upfront. Instead, they report their value for certain bundles iteratively, and the allocation of the items is determined by solving a winner determination problem. During this process, the bidder profiles are modeled with neural networks. However, DL-ICA may not always achieve the optimal winner allocation due to the relatively low number of reported bundles, resulting in reduced economic efficiency. This paper proposes an algorithm that uses active learning for initial sampling strategies to improve the resulting economic efficiency (social welfare). The proposed algorithm outperforms previous studies in real-world combinatorial auction models across various domains while using fewer samples on average.

PDF

AAAI Conference 2023 Conference Paper

Invariant Representations with Stochastically Quantized Neural Networks

Mattia Cerrato
Marius Köppel
Roberto Esposito
Stefan Kramer

Representation learning algorithms offer the opportunity to learn invariant representations of the input data with regard to nuisance factors. Many authors have leveraged such strategies to learn fair representations, i.e., vectors where information about sensitive attributes is removed. These methods are attractive as they may be interpreted as minimizing the mutual information between a neural layer's activations and a sensitive attribute. However, the theoretical grounding of such methods relies either on the computation of infinitely accurate adversaries or on minimizing a variational upper bound of a mutual information estimate. In this paper, we propose a methodology for direct computation of the mutual information between neurons in a layer and a sensitive attribute. We employ stochastically-activated binary neural networks, which lets us treat neurons as random variables. Our method is therefore able to minimize an upper bound on the mutual information between the neural representations and a sensitive attribute. We show that this method compares favorably with the state of the art in fair representation learning and that the learned representations display a higher level of invariance compared to full-precision neural networks.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

A Brief History of Learning Symbolic Higher-Level Representations from Data (And a Curious Look Forward)

Stefan Kramer

Learning higher-level representations from data has been on the agenda of AI research for several decades. In the paper, I will give a survey of various approaches to learning symbolic higher-level representations: feature construction and constructive induction, predicate invention, propositionalization, pattern mining, and mining time series patterns. Finally, I will give an outlook on how approaches to learning higher-level representations, symbolic and neural, can benefit from each other to solve current issues in machine learning.

PDF Details DOI

JMLR Journal 2019 Journal Article

Decoupling Sparsity and Smoothness in the Dirichlet Variational Autoencoder Topic Model

Sophie Burkhardt
Stefan Kramer

Recent work on variational autoencoders (VAEs) has enabled the development of generative topic models using neural networks. Topic models based on latent Dirichlet allocation (LDA) successfully use the Dirichlet distribution as a prior for the topic and word distributions to enforce sparseness. However, there is a trade-off between sparsity and smoothness in Dirichlet distributions. Sparsity is important for a low reconstruction error during training of the autoencoder, whereas smoothness enables generalization and leads to a better log-likelihood of the test data. Both of these properties are encoded in the Dirichlet parameter vector. By rewriting this parameter vector into a product of a sparse binary vector and a smoothness vector, we decouple the two properties, leading to a model that features both a competitive topic coherence and a high log-likelihood. Efficient training is enabled using rejection sampling variational inference for the reparameterization of the Dirichlet distribution. Our experiments show that our method is competitive with other recent VAE topic models. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

PDF Details

AAAI Conference 2010 Conference Paper

Fast Conditional Density Estimation for Quantitative Structure-Activity Relationships

Fabian Buchwald
Tobias Girschick
Eibe Frank
Stefan Kramer

Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.

PDF Details

AIIM Journal 2002 Journal Article

Analysis of respiratory pressure–volume curves in intensive care medicine using inductive machine learning

Steven Ganzert
Josef Guttmann
Kristian Kersting
Ralf Kuhlen
Christian Putensen
Michael Sydow
Stefan Kramer

We present a case study of machine learning and data mining in intensive care medicine. In the study, we compared different methods of measuring pressure–volume curves in artificially ventilated patients suffering from the adult respiratory distress syndrome (ARDS). Our aim was to show that inductive machine learning can be used to gain insights into differences and similarities among these methods. We defined two tasks: the first one was to recognize the measurement method producing a given pressure–volume curve. This was defined as the task of classifying pressure–volume curves (the classes being the measurement methods). The second was to model the curves themselves, that is, to predict the volume given the pressure, the measurement method and the patient data. Clearly, this can be defined as a regression task. For these two tasks, we applied C5. 0 and CUBIST, two inductive machine learning tools, respectively. Apart from medical findings regarding the characteristics of the measurement methods, we found some evidence showing the value of an abstract representation for classifying curves: normalization and high-level descriptors from curve fitting played a crucial role in obtaining reasonably accurate models. Another useful feature of algorithms for inductive machine learning is the possibility of incorporating background knowledge. In our study, the incorporation of patient data helped to improve regression results dramatically, which might open the door for the individual respiratory treatment of patients in the future.

Details DOI

AAAI Conference 1996 Conference Paper

Structural Regression Trees

Stefan Kramer

In many real-world domains the task of machine learning algorithms is to learn a theory for predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with nondeterminate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems. SRT integrates the statistical method of regression trees into ILP. It constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP systems cannot handle. Experiments in several real-world domains demonstrate that the approach is competitive with existing methods, indicating that the advantages are not at the expense of predictive accuracy.

PDF Details