Author name cluster

Matthias Scheutz

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

68 papers

2 author rows

AAAI Conference 2026 System Paper

IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection

Kaveh Eskandari Miandoab
Katharine Kowalyshyn
Kabir Pamnani
Anesu Gavhera
Vasanth Sarathy
Matthias Scheutz

We present IntelliProof, an interactive system for analyzing argumentative essays through LLMs. IntelliProof structures an essay as an argumentation graph, where claims are represented as nodes, supporting evidence is attached as node properties, and edges encode supporting or attacking relations. Unlike existing automated essay scoring systems, IntelliProof emphasizes the user experience: each relation is initially classified and scored by an LLM, then visualized for enhanced understanding. The system provides justifications for classifications and produces quantitative measures for essay coherence. It enables rapid exploration of argumentative quality while retaining human oversight. In addition, IntelliProof provides a set of tools for a better understanding of an argumentative essay and its corresponding graph in natural language, bridging the gap between the structural semantics of argumentative essays and the user's understanding of a given text.

PDF Details DOI

TIST Journal 2026 Journal Article

On Evaluating LLM Integration into Robotic Architectures

Vasanth Sarathy
Marlow Fawn
Matthew McWilliams
Matthias Scheutz
Bradley Oosterveld

LLMs are being increasingly integrated into embodied robotic systems. A useful capability that the LLMs bring to robots is translating noisy spoken human natural language instructions into executable robot actions. However, these integrations are somewhat ad hoc and understudied as they tend to not consider the gamut of syntactic, semantic, as well as pragmatic aspects of embodied human communication. What is missing is a characterization of the different paradigms for integrating LLMs into robotic architectures as well as a set of evaluation metrics that capture whether an LLM-equipped robot can correctly understand these different aspects of human instruction. In this article, we present a suite of evaluation metrics together with data augmentation techniques for evaluating these architectures, using concepts from the cognitive science and human communication literature. To illustrate an application of these metrics and augmentation techniques, we conduct experiments to compare two integration methods: LLMs as pre-processing components that map human instructions into more constrained versions to be processed by the architecture’s natural language understanding (NLU) subsystem, or LLMs as a wholesale replacement for the NLU’s parser. We provide experimental evaluations and a robotic implementation to show the inherent tradeoffs between the methods. Our results suggest that while they offer increased explainability, traditional parsing tools coupled with LLMs do not perform as well as an LLM that replaces a parser entirely. The proposed evaluation metrics together with the characterization of different LLM integration approaches offer the promise of systematically evaluating LLMs as natural language interfaces to robotic systems as well as tackle the important tradeoff between explainability/verifiability/interpretability and robustness to noisy input and broad language understanding in an open-world embodied setting.

Details DOI

AAAI Conference 2026 Conference Paper

Where Norms and References Collide: Evaluating LLMs on Normative Reasoning

Mitchell Abrams
Kaveh Eskandari Miandoab
Felix Gervits
Vasanth Sarathy
Matthias Scheutz

Embodied agents, such as robots, will need to interact in situated environments where successful communication often depends on reasoning over social norms: shared expectations that constrain what actions are appropriate in context. A key capability in such settings is norm-based reference resolution (NBRR), where interpreting referential expressions requires inferring implicit normative expectations grounded in physical and social context. Yet it remains unclear whether Large Language Models (LLMs) can support this kind of reasoning. In this work, we introduce SNIC (Situated Norms in Context), a human-validated diagnostic testbed designed to probe how well state-of-the-art LLMs can extract and utilize normative principles relevant to NBRR. SNIC emphasizes physically grounded norms that arise in everyday tasks such as cleaning, tidying, and serving. Across a range of controlled evaluations, we find that even the strongest LLMs struggle to consistently identify and apply social norms—particularly when norms are implicit, underspecified, or in conflict. These findings reveal a blind spot in current LLMs and highlight a key challenge for deploying language-based systems in socially situated, embodied settings.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Curiosity-Driven Imagination: Discovering Plan Operators and Learning Associated Policies for Open-World Adaptation

Pierrick Lorang
Hong Lu
Matthias Scheutz

Adapting quickly to dynamic, uncertain environments—often called “open worlds” —remains a major challenge in robotics. Traditional Task and Motion Planning (TAMP) approaches struggle to cope with unforeseen changes, are data-inefficient when adapting, and do not leverage world models during learning. We address this issue with a hybrid planning and learning system that integrates two models: a low-level neural network-based model that learns stochastic transitions and drives exploration via an Intrinsic Curiosity Module (ICM), and a high-level symbolic planning model that captures abstract transitions using operators, enabling the agent to plan in an “imaginary” space and generate reward machines. Our evaluation in a robotic manipulation domain with sequential novelty injections demonstrates that our approach converges faster and outperforms state-of-the-art hybrid methods.