Arrow Research search

Author name cluster

Steffen Staab

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

AAAI Conference 2026 Conference Paper

Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning

  • Hongkuan Zhou
  • Lavdim Halilaj
  • Sebastian Monka
  • Stefan Schmid
  • Yuqicheng Zhu
  • Jingcheng Wu
  • Nadeem Nazer
  • Steffen Staab

Open-domain visual entity recognition aims to identify and link entities depicted in images to a vast and evolving set of real-world concepts, such as those found in Wikidata. Unlike conventional classification tasks with fixed label sets, it operates under open-set conditions, where most target entities are unseen during training and exhibit long-tail distributions. This makes the task inherently challenging due to limited supervision, high visual ambiguity, and the need for semantic disambiguation. We propose a Knowledge-guided Contrastive Learning (KnowCoL) framework that combines both images and text descriptions into a shared semantic space grounded by structured information from Wikidata. By abstracting visual and textual inputs to a conceptual level, the model leverages entity descriptions, type hierarchies, and relational context to support zero-shot entity recognition. We evaluate our approach on the OVEN benchmark, a large-scale open-domain visual recognition dataset with Wikidata IDs as the label space. Our experiments show that using visual, textual, and structured knowledge greatly improves accuracy, especially for rare and unseen entities. Our smallest model improves the accuracy on unseen entities by 10.5% compared to the state-of-the-art, despite being 35 times smaller.

NeSy Conference 2025 Conference Paper

ArgRAG: Explainable Retrieval Augmented Generation using Quantitative Bipolar Argumentation

  • Yuqicheng Zhu
  • Nico Potyka
  • Daniel Hernández 0002
  • Yuan He 0008
  • Zifeng Ding
  • Bo Xiong 0001
  • Dongzhuoran Zhou
  • Evgeny Kharlamov

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge, yet suffers from critical limitations in high-stakes domains—namely, sensitivity to noisy or contradictory evidence and opaque, stochastic decision-making. We propose \textsc{ArgRAG}, an explainable, and contestable alternative that replaces black-box reasoning with structured inference using a Quantitative Bipolar Argumentation Framework (QBAF). \textsc{ArgRAG} constructs a QBAF from retrieved documents and performs deterministic reasoning under gradual semantics. This allows faithfully explanaining and contesting decisions. Evaluated on two fact verification benchmarks, PubHealth and RAGuard, \textsc{ArgRAG} achieves strong accuracy while significantly improving transparency.

TMLR Journal 2025 Journal Article

Explanation Shift: How Did the Distribution Shift Impact the Model?

  • Carlos Mougan
  • Klaus Broelemann
  • Gjergji Kasneci
  • Thanassis Tiropanis
  • Steffen Staab

The performance of machine learning models on new data is critical for their success in real-world applications. Current methods to detect shifts in the input or output data distributions have limitations in identifying model behaviour changes when no labelled data is available. In this paper, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data. Additionally, we release an open-source Python package, \texttt{skshift}, which implements our method and provides usage tutorials for further reproducibility.

ICLR Conference 2025 Conference Paper

From Tokens to Lattices: Emergent Lattice Structures in Language Models

  • Bo Xiong 0001
  • Steffen Staab

Pretrained masked language models (MLMs) have demonstrated an impressive capability to comprehend and encode conceptual knowledge, revealing a lattice structure among concepts. This raises a critical question: how does this conceptualization emerge from MLM pretraining? In this paper, we explore this problem from the perspective of Formal Concept Analysis (FCA), a mathematical framework that derives concept lattices from the observations of object-attribute relationships. We show that the MLM's objective implicitly learns a formal context that describes objects, attributes, and their dependencies, which enables the reconstruction of a concept lattice through FCA. We propose a novel framework for concept lattice construction from pretrained MLMs and investigate the origin of the inductive biases of MLMs in lattice structure learning. Our framework differs from previous work because it does not rely on human-defined concepts and allows for discovering "latent" concepts that extend beyond human definitions. We create three datasets for evaluation, and the empirical results verify our hypothesis.

ECAI Conference 2025 Conference Paper

Full-History Graphs with Edge-Type Decoupled Networks for Temporal Reasoning

  • Osama Mohammed
  • Jiaxin Pan 0003
  • Mojtaba Nayyeri
  • Daniel Hernández 0002
  • Steffen Staab

Modeling evolving interactions among entities is critical in many real-world tasks. For example, predicting driver maneuvers in traffic requires tracking how neighboring vehicles accelerate, brake, and change lanes relative to one another over consecutive frames. Similarly, detecting financial fraud hinges on following the flow of funds through successive transactions as they propagate across the network. Unlike classic time-series forecasting, these settings demand reasoning over who interacts with whom and when, calling for a temporal-graph representation that makes both the relations and their evolution explicit. Existing temporal-graph methods use snapshot graphs to represent temporal evolution. In this paper, we introduce a full-history graph that instantiates one node for every entity at every timestep and separates two edge sets: (i) intra-timestep edges that capture relations within a single frame, and (ii) inter-timestep edges that connect an entity to itself at consecutive steps. To learn on this graph we design an Edge-Type Decoupled Network (ETDNet) with parallel modules: a graph-attention module aggregates information along intra-timestep edges, a multi-head temporal-attention module attends over an entity’s inter-timestep history, and a fusion module combines the two messages after every layer. When evaluated on driver-intention prediction (Waymo) and Bitcoin fraud detection (Elliptic++), ETDNet consistently surpasses strong baselines, lifting Waymo joint accuracy to 75. 6 % (vs. 74. 1 %) and raising Elliptic++ illicit-class F1 to 88. 1 % (vs. 60. 4 %). These gains demonstrate the benefit of representing structural and temporal relations as distinct edges in a single graph.

ICML Conference 2025 Conference Paper

Is Complex Query Answering Really Complex?

  • Cosimo Gregucci
  • Bo Xiong 0001
  • Daniel Hernández 0002
  • Lorenzo Loconte
  • Pasquale Minervini
  • Steffen Staab
  • Antonio Vergari

Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks most queries (up to 98% for some query types) can be reduced to simpler problems, e. g. , link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreses significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

IROS Conference 2025 Conference Paper

MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving

  • Thomas Monninger
  • Zihan Zhang
  • Zhipeng Mo
  • Md Zafar Anwar
  • Steffen Staab
  • Sihao Ding 0004

Autonomous driving requires an understanding of the static environment from sensor data. Learned Bird’s-Eye View (BEV) encoders are commonly used to fuse multiple inputs, and a vector decoder predicts a vectorized map representation from the latent BEV grid. However, traditional map construction models provide deterministic point estimates, failing to capture uncertainty and the inherent ambiguities of real-world environments, such as occlusions and missing lane markings. We propose MapDiffusion, a novel generative approach that leverages the diffusion paradigm to learn the full distribution of possible vectorized maps. Instead of predicting a single deterministic output from learned queries, MapDiffusion iteratively refines randomly initialized queries, conditioned on a BEV latent grid, to generate multiple plausible map samples. This allows aggregating samples to improve prediction accuracy and deriving uncertainty estimates that directly correlate with scene ambiguity. Extensive experiments on the nuScenes dataset demonstrate that MapDiffusion achieves state-of-the-art performance in online map construction, surpassing the baseline by 5% in single-sample performance. We further show that aggregating multiple samples consistently improves performance along the ROC curve, validating the benefit of distribution modeling. Additionally, our uncertainty estimates are significantly higher in occluded areas, reinforcing their value in identifying regions with ambiguous sensor input. By modeling the full map distribution, MapDiffusion enhances the robustness and reliability of online vectorized HD map construction, enabling uncertainty-aware decision-making for autonomous vehicles in complex environments.

ECAI Conference 2024 Conference Paper

Generating SROI - Ontologies via Knowledge Graph Query Embedding Learning

  • Yunjie He
  • Daniel Hernández 0002
  • Mojtaba Nayyeri
  • Bo Xiong 0001
  • Yuqicheng Zhu
  • Evgeny Kharlamov
  • Steffen Staab

Query embedding approaches answer complex logical queries over incomplete knowledge graphs (KGs) by computing and operating on low-dimensional vector representations of entities, relations, and queries. However, current query embedding models heavily rely on excessively parameterized neural networks and cannot explain the knowledge learned from the graph. We propose a novel query embedding method, AConE, which explains the knowledge learned from the graph in the form of SROI− description logic axioms while being more parameter-efficient than most existing approaches. AConE associates queries to SROI− description logic concepts. Every SROI− concept is embedded as a cone in complex vector space, and each SROI− relation is embedded as a transformation that rotates and scales cones. We show theoretically that AConE can learn SROI− axioms, and defines an algebra whose operations correspond one-to-one to SROI− description logic concept constructs. Our empirical study on multiple query datasets shows that AConE achieves superior results over previous baselines with fewer parameters. Notably on the WN18RR dataset, AConE achieves significant improvement over baseline models. We provide comprehensive analyses showing that the capability to represent axioms positively impacts the results of query answering.

AAAI Conference 2024 Conference Paper

HGE: Embedding Temporal Knowledge Graphs in a Product Space of Heterogeneous Geometric Subspaces

  • Jiaxin Pan
  • Mojtaba Nayyeri
  • Yinan Li
  • Steffen Staab

Temporal knowledge graphs represent temporal facts (s,p,o,?) relating a subject s and an object o via a relation label p at time?, where? could be a time point or time interval. Temporal knowledge graphs may exhibit static temporal patterns at distinct points in time and dynamic temporal patterns between different timestamps. In order to learn a rich set of static and dynamic temporal patterns and apply them for inference, several embedding approaches have been suggested in the literature. However, as most of them resort to single underlying embedding spaces, their capability to model all kinds of temporal patterns was severely limited by having to adhere to the geometric property of their one embedding space. We lift this limitation by an embedding approach that maps temporal facts into a product space of several heterogeneous geometric subspaces with distinct geometric properties, i.e.\ Complex, Dual, and Split-complex spaces. In addition, we propose a temporal-geometric attention mechanism to integrate information from different geometric subspaces conveniently according to the captured relational and temporal information. Experimental results on standard temporal benchmark datasets favorably evaluate our approach against state-of-the-art models.

AAAI Conference 2024 Conference Paper

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

  • Bo Xiong
  • Mojtaba Nayyeri
  • Linhao Luo
  • Zihao Wang
  • Shirui Pan
  • Steffen Staab

Reasoning with knowledge graphs (KGs) has primarily focused on triple-shaped facts. Recent advancements have been explored to enhance the semantics of these facts by incorporating more potent representations, such as hyper-relational facts. However, these approaches are limited to atomic facts, which describe a single piece of information. This paper extends beyond atomic facts and delves into nested facts, represented by quoted triples where subjects and objects are triples themselves (e.g., ((BarackObama, holds_position, President), succeed_by, (DonaldTrump, holds_position, President))). These nested facts enable the expression of complex semantics like situations over time and logical patterns} over entities and relations. In response, we introduce NestE, a novel KG embedding approach that captures the semantics of both atomic and nested factual knowledge. NestE represents each atomic fact as a 1*3 matrix, and each nested relation is modeled as a 3*3 matrix that rotates the 1*3 atomic fact matrix through matrix multiplication. Each element of the matrix is represented as a complex number in the generalized 4D hypercomplex space, including (spherical) quaternions, hyperbolic quaternions, and split-quaternions. Through thorough analysis, we demonstrate the embedding's efficacy in capturing diverse logical patterns over nested facts, surpassing the confines of first-order logic-like expressions. Our experimental results showcase NestE's significant performance gains over current baselines in triple prediction and conditional link prediction. The code and pre-trained models are open available at https://github.com/xiongbo010/NestE.

AAMAS Conference 2024 Conference Paper

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

  • Nico Potyka
  • Yuqicheng Zhu
  • Yunjie He
  • Evgeny Kharlamov
  • Steffen Staab

Large-language models (LLMs) can support a wide range of applications like conversational agents, creative writing or general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.

IROS Conference 2024 Conference Paper

TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation

  • Thomas Monninger
  • Vandana Dokkadi
  • Md Zafar Anwar
  • Steffen Staab

Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird’s-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different views. Accuracy can further be improved by aggregating sensor information over time. This is especially important in monocular camera systems to account for the lack of explicit depth and velocity measurements, such that decision-critical information about distance distances and motions of other objects is easily accessible. Thereby, the effectiveness of developed BEV encoders crucially depends on the operators used to aggregate temporal information and on the used latent representation spaces. We analyze BEV encoders proposed in the literature and compare their effectiveness, quantifying the effects of aggregation operators and latent representations. While most existing approaches aggregate temporal information either in image or in BEV latent space, our analyses and performance comparisons suggest that these latent representations exhibit complementary strengths. Therefore, we develop a novel temporal BEV encoder, TempBEV, which integrates aggregated temporal information from both latent spaces. We consider subsequent image frames as stereo through time and leverage methods from optical flow estimation for temporal stereo encoding. Empirical evaluation on the NuScenes dataset shows a significant improvement by TempBEV over the baseline for 3D object detection and BEV segmentation. The ablation uncovers a strong synergy of joint temporal aggregation in the image and BEV latent space. These results indicate the overall effectiveness of our approach and make a strong case for aggregating temporal information in both image and BEV latent spaces.

IJCAI Conference 2023 Conference Paper

ReLiNet: Stable and Explainable Multistep Prediction with Recurrent Linear Parameter Varying Networks

  • Alexandra Baier
  • Decky Aspandi
  • Steffen Staab

Multistep prediction models are essential for the simulation and model-predictive control of dynamical systems. Verifying the safety of such models is a multi-faceted problem requiring both system-theoretic guarantees as well as establishing trust with human users. In this work, we propose a novel approach, ReLiNet (Recurrent Linear Parameter Varying Network), to ensure safety for multistep prediction of dynamical systems. Our approach simplifies a recurrent neural network to a switched linear system that is constrained to guarantee exponential stability, which acts as a surrogate for safety from a system-theoretic perspective. Furthermore, ReLiNet's computation can be reduced to a single linear model for each time step, resulting in predictions that are explainable by definition, thereby establishing trust from a human-centric perspective. Our quantitative experiments show that ReLiNet achieves prediction accuracy comparable to that of state-of-the-art recurrent neural networks, while achieving more faithful and robust explanations compared to the model-agnostic explanation method of LIME.

NeurIPS Conference 2022 Conference Paper

Hyperbolic Embedding Inference for Structured Multi-Label Prediction

  • Bo Xiong
  • Michael Cochez
  • Mojtaba Nayyeri
  • Steffen Staab

We consider a structured multi-label prediction problem where the labels are organized under implication and mutual exclusion constraints. A major concern is to produce predictions that are logically consistent with these constraints. To do so, we formulate this problem as an embedding inference problem where the constraints are imposed onto the embeddings of labels by geometric construction. Particularly, we consider a hyperbolic Poincaré ball model in which we encode labels as Poincaré hyperplanes that work as linear decision boundaries. The hyperplanes are interpreted as convex regions such that the logical relationships (implication and exclusion) are geometrically encoded using the insideness and disjointedness of these regions, respectively. We show theoretical groundings of the method for preserving logical relationships in the embedding space. Extensive experiments on 12 datasets show 1) significant improvements in mean average precision; 2) lower number of constraint violations; 3) an order of magnitude fewer dimensions than baselines.

NeurIPS Conference 2022 Conference Paper

Pseudo-Riemannian Graph Convolutional Networks

  • Bo Xiong
  • Shichao Zhu
  • Nico Potyka
  • Shirui Pan
  • Chuan Zhou
  • Steffen Staab

Graph Convolutional Networks (GCNs) are powerful frameworks for learning embeddings of graph-structured data. GCNs are traditionally studied through the lens of Euclidean geometry. Recent works find that non-Euclidean Riemannian manifolds provide specific inductive biases for embedding hierarchical or spherical data. However, they cannot align well with data of mixed graph topologies. We consider a larger class of pseudo-Riemannian manifolds that generalize hyperboloid and sphere. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected pseudo-Riemannian manifolds. As a consequence, we derive a pseudo-Riemannian GCN that models data in pseudo-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufficiently flexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. We demonstrate the representational capabilities of this method by applying it to the tasks of graph reconstruction, node classification, and link prediction on a series of standard graphs with mixed topologies. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.

KR Conference 2020 Conference Paper

Concept Contraction in the Description Logic EL

  • Tjitze Rienstra
  • Claudia Schon
  • Steffen Staab

In this paper we study the problem of concept contraction for the description logic EL. Concept contraction is concerned with the following question: Given two concepts C and D (with the interesting case being that D subsumes C) how can we find a generalisation of C that is not subsumed by D but is otherwise as similar as possible to C? We take an AGM-style approach and model this problem using the notion of a concept contraction operator. We consider constructive definitions as well as sets of postulates for concept contraction, and link the two by means of representation theorems.

IS Journal 2016 Journal Article

Data Mining and Automated Discrimination: A Mixed Legal/Technical Perspective

  • Laura Carmichael
  • Sophie Stalla-Bourdillon
  • Steffen Staab

Socially sensitive decisions about critical issues such as employment, credit scoring, or insurance premiums are increasingly automated based on big data mining. Although algorithms do not have personal preferences, they are not neutral, and the data itself can reflect various undesirable biases. The authors discuss how discrimination-aware data mining constitutes a crucial step to counter automated discrimination. They then explain why the complexity of legal and social norms requires a balanced interdisciplinary methodology and toolset comprising requirements relating to data accuracy, protection, and provenance, and legitimacy of targeted objectives.

IJCAI Conference 2009 Conference Paper

  • Philipp Cimiano
  • Antje Schultz
  • Sergej Sizov
  • Philipp Sorg
  • Steffen Staab

The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such as WordNet, latent topics derived from the data itself - as in Latent Semantic Indexing (LSI) or (Latent Dirichlet Allocation (LDA) - to Wikipedia articles as proxies for concepts, as in the recently proposed Explicit Semantic Analysis (ESA) model. A crucial question which has not been answered so far is whether models based on explicitly given concepts (as in the ESA model for instance) perform inherently better than retrieval models based on “latent” concepts (as in LSI and/or LDA). In this paper we investigate this question closer in the context of a cross-language setting, which inherently requires concept-based retrieval bridging between different languages. In particular, we compare the recently proposed ESA model with two latent models (LSI and LDA) showing that the former is clearly superior to the both. From a general perspective, our results contribute to clarifying the role of explicit vs. implicitly derived or latent concepts in (crosslanguage) information retrieval research.

KR Conference 2004 Conference Paper

Towards a Quantitative, Platform-Independent Analysis of Knowledge Systems

  • Noah S. Friedland
  • Paul G. Allen
  • Michael Witbrock
  • Gavin Matthews
  • Nancy Salay
  • Pierluigi Miraglia
  • Jurgen Angele
  • Steffen Staab

The Halo Pilot, a six-month effort to evaluate the state-ofthe- art in applied Knowledge Representation and Reasoning (KRR) systems, collaboratively developed a taxonomy of failures with the goal of creating a common framework of metrics against which we could measure inter- and intra- system failure characteristics of each of the three Halo knowledge applications. This platform independent taxonomy was designed with the intent of maximizing its coverage of potential failure types; providing the necessary granularity and precision to enable clear categorization of failure types; and providing a productive framework for short and longer term corrective action. Examining the failure analysis and initial empirical use of the taxonomy provides quantitative insights into the strengths and weaknesses of individual systems and raises some issues shared by all three. These results are particularly interesting when considered against the long history of assumed reasons for knowledge system failure. Our study has also uncovered some shortcomings in the taxonomy itself, implying the need to improve both its granularity and precision. It is the hope of Project Halo to eventually produce a failure taxonomy and associated methodology that will be of general use in the fine-grained analysis of knowledge systems.

IJCAI Conference 1999 Conference Paper

Scalable Temporal Reasoning

  • Steffen Staab
  • Udo Hahn

We introduce two mechanisms for scaling computations in the framework of temporal reasoning. The first one addresses abstraction at the methodological level. Operators are defined that engender flexible switching between different granularities of temporal representation structures. The second one accounts for abstractions at the interface level of a temporal reasoning engine. Various generalizations of temporal relations are introduced that approximate more fine-grained representations by abstracting away irrelevant details.

IJCAI Conference 1997 Conference Paper

"Tall", "Good", "High"— Compared to What?

  • Steffen Staab
  • Udo Hahn

We specify a model for the conceptual interpretation of positive gradable adjectives. Building on a classification-based terminological reasoning approach we define comparison classes and class norms and specify how a degree is related to its corresponding class norm.