Arrow Research search

Author name cluster

Mehul Bhatt

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

KR Conference 2025 System Paper

ASP-Driven Visual Commonsense: A General Framework for Reasoning About Embodied Interaction in the Wild

  • Jakob Suchan
  • Mehul Bhatt
  • Julius Monsen

We present a general framework for declaratively grounded visual commonsense (reasoning) about embodied interaction in naturalistic, in-the-wild settings relevant to a range of AI application domains. The core computational capabilities of the framework pertaining visual commonsense are driven by a robust neurosymbolic architecture primarily consisting of: (1) answer set programming based modelling of foundational aspects pertaining spatio-temporal dynamics, encompassing space, time, events, action, motion; (2) modularly integrated visual computing techniques constituting the neural substrate linking quantitative perceptual features serving as low-level counterparts to high-level semantic characterisations of (inter)active visual commonsense. Practically, we also present a first open-release of the developed framework with the aim to promote independent extensions and real-world applied KRR. The release comprises: (a) demonstrated case-studies in domains such as autonomous driving, psychology and media studies; (b) systematic evaluation mechanisms for community benchmarking; and (c) supporting material such as tutorials and datasets.

AIJ Journal 2021 Journal Article

Commonsense visual sensemaking for autonomous driving – On generalised neurosymbolic online abduction integrating vision and semantics

  • Jakob Suchan
  • Mehul Bhatt
  • Srikrishna Varadarajan

We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking —e. g. , involving semantic representation and explainability, question-answering, commonsense interpolation— in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations.

ECAI Conference 2020 Conference Paper

Cognitive Vision and Perception

  • Mehul Bhatt
  • Jakob Suchan

Semantic interpretation of dynamic visuospatial imagery calls for a general and systematic integration of methods in knowledge representation and computer vision. Towards this, we highlight research articulating & developing deep semantics, characterised by the existence of declarative models –e. g. , pertaining space and motion– and corresponding formalisation and reasoning methods supporting capabilities such as semantic question-answering, relational visuospatial learning, and (non-monotonic) visuospatial explanation. We position a working model for deep semantics by highlighting select recent / closely related works from IJCAI [8, 4], AAAI [10], ILP [7], and ACS [9]. We posit that human-centred, explainable visual sensemaking necessitates both high-level semantics and low-level visual computing, with the highlighted works providing a model for systematic, modular integration of diverse multifaceted techniques developed in AI, ML, and Computer Vision.

ECAI Conference 2020 Conference Paper

Driven by Commonsense

  • Jakob Suchan
  • Mehul Bhatt
  • Srikrishna Varadarajan

Within the autonomous driving domain, there is now a clear need and tremendous potential for hybrid solutions (e. g. , integrating semantics, learning, visual computing) towards fulfilling essential legal and ethical responsibilities involving explainability (e. g. , for diagnosis), human-centred AI (e. g. , interaction design), and industrial standardisation (e. g, pertaining to representation, realisation of rules & norms). In these contexts, this highlight paper positions recent research from IJCAI 2019 [4] aimed at advancing human-centred AI principles in the backdrop of the autonomous driving application domain. From a technical viewpoint, the highlighted research provides a model for advancing the state of the art in reasoning about space and motion, combining reasoning and learning, nonmonotonic reasoning, and computational modelling of high-level visuospatial commonsense. In addition to demonstrating the significance of integrated vision and semantics solutions in autonomous driving, we also highlight open questions emphasising the need for interdisciplinary mixed-methods research –involving AI, Psychology, HCI– to better appreciate the complexity and spectrum of varied human-centred challenges in diverse naturalistic driving situations.

NeSy Conference 2019 Conference Paper

Deep Semantics for Explainable Visuospatial Intelligence: Perspectives on Integrating Commonsense Spatial Abstractions and Low-Level Neural Features

  • Mehul Bhatt
  • Jakob Suchan
  • Srikrishna Varadarajan

High-level semantic interpretation of dynamic visual imagery calls for general and systematic methods integrating techniques in knowledge representation and computer vision. Towards this, we position deep semantics, denoting the existence of declarative models such as those pertaining space and motion, and corresponding formalisation and methods supporting domain-independent explainability capabilities such as semantic question-answering, relational and relationally-driven visuospatial learning, and non-monotonic visuospatial abduction. Rooted in recent work, we summarise and report the status quo on deep visuospatial semantics, and our approach to neurosymbolic integration and explainable visuo-spatial computing in that context, with developed methods and tools in diverse settings such as behavioural research in psychology, art and social sciences, and autonomous driving.

IJCAI Conference 2019 Conference Paper

Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving

  • Jakob Suchan
  • Mehul Bhatt
  • Srikrishna Varadarajan

We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e. g. , semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations.

AAAI Conference 2018 Conference Paper

Visual Explanation by High-Level Abduction: On Answer-Set Programming Driven Reasoning About Moving Objects

  • Jakob Suchan
  • Mehul Bhatt
  • Przemysław Wałega
  • Carl Schultz

We propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with SPACE-TIME TRACKLETS as native entities; and (2) a visual processing pipeline for detection based object tracking and motion analysis. We present the formal framework, its general implementation as a (declarative) method in answer set programming, and an example application and evaluation based on two diverse video datasets: the MOTChallenge benchmark developed by the vision community, and a recently developed Movie Dataset.

IJCAI Conference 2016 Conference Paper

Robust Natural Language Processing - Combining Reasoning, Cognitive Semantics, and Construction Grammar for Spatial Language

  • Michael Spranger
  • Jakob Suchan
  • Mehul Bhatt

We present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as "across" and "in front of. " We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omissions and ungrammatical utterances.

IJCAI Conference 2016 Conference Paper

Semantic Question-Answering with Video and Eye-Tracking Data: AI Foundations for Human Visual Perception Driven Cognitive Film Studies

  • Jakob Suchan
  • Mehul Bhatt

We present a computational framework for the grounding and semantic interpretation of dynamic visuo-spatial imagery consisting of video and eye-tracking data. Driven by cognitive film studies and visual perception research, we demonstrate key technological capabilities aimed at investigating attention and recipient effects vis-a-vis the motion picture; this encompasses high-level analysis of subject's visual fixation patterns and correlating this with (deep) semantic analysis of the dynamic visual data (e. g. , fixation on movie characters, influence of cinematographic devices such as cuts). The framework and its application as a general AI-based assistive technology platform - integrating vision and KR - for cognitive film studies is highlighted.

JAIR Journal 2015 Journal Article

Learning Relational Event Models from Video

  • Krishna S. R. Dubba
  • Anthony G. Cohn
  • David C. Hogg
  • Mehul Bhatt
  • Frank Dylla

Event models obtained automatically from video can be used in applications ranging from abnormal event detection to content based video retrieval. When multiple agents are involved in the events, characterizing events naturally suggests encoding interactions as relations. Learning event models from this kind of relational spatio-temporal data using relational learning techniques such as Inductive Logic Programming (ILP) hold promise, but have not been successfully applied to very large datasets which result from video data. In this paper, we present a novel framework REMIND (Relational Event Model INDuction) for supervised relational learning of event models from large video datasets using ILP. Efficiency is achieved through the learning from interpretations setting and using a typing system that exploits the type hierarchy of objects in a domain. The use of types also helps prevent over generalization. Furthermore, we also present a type-refining operator and prove that it is optimal. The learned models can be used for recognizing events from previously unseen videos. We also present an extension to the framework by integrating an abduction step that improves the learning performance when there is noise in the input data. The experimental results on several hours of video data from two challenging real world domains (an airport domain and a physical action verbs domain) suggest that the techniques are suitable to real world scenarios.

ECAI Conference 2014 Conference Paper

Declarative Spatial Reasoning with Boolean Combinations of Axis-Aligned Rectangular Polytopes

  • Carl Schultz 0001
  • Mehul Bhatt

We present a formal framework and implementation for declarative spatial representation and reasoning about the topological relationships between boolean combinations of regions (i. e. , union, intersection, difference, xor). Regions of space here correspond to arbitrary axis aligned n-polytope objects, with geometric parameters either fully grounded, partially grounded, or completely unspecified. The framework is implemented in the context of CLP(𝒬 𝒮)CLP(𝒬 𝒮): A Declarative Spatial Reasoning System. www. spatial-reasoning. com

ECAI Conference 2012 Conference Paper

Towards a Declarative Spatial Reasoning System

  • Carl Schultz 0001
  • Mehul Bhatt

We present early results on the development of a declarative spatial reasoning system within the context of the Constraint Logic Programming (CLP) framework. The system is capable of modelling and reasoning about qualitative spatial relations pertaining to multiple spatial domains, i. e. , one or more aspects of space such as topology, and intrinsic and extrinsic orientation. It provides a seamless mechanism for combining formal qualitative spatial calculi within one framework, and provides a Prolog-based declarative interface for AI applications to abstract and reason about quantitative, geometric information in a qualitative manner. Based on previous work concerning the formalisation of the framework [2], we present ongoing work to develop the theoretical result into a comprehensive reasoning system (and Prolog-based library) which may be used independently, or as a logic-based module within hybrid intelligent systems.