AIJ Journal 2026 Journal Article
From monotonic graph neural networks to datalog and back: Expressive power and practical applications
- David Tena Cucala
- Bernardo Cuenca Grau
- Boris Motik
- Egor V. Kostylev
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
AIJ Journal 2026 Journal Article
KR Conference 2023 Conference Paper
Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i. e. , a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs.
ICLR Conference 2022 Conference Paper
Graph Neural Networks (GNNs) are often used to learn transformations of graph data. While effective in practice, such approaches make predictions via numeric manipulations so their output cannot be easily explained symbolically. We propose a new family of GNN-based transformations of graph data that can be trained effectively, but where all predictions can be explained symbolically as logical inferences in Datalog—a well-known rule-based formalism. In particular, we show how to encode an input knowledge graph into a graph with numeric feature vectors, process this graph using a GNN, and decode the result into an output knowledge graph. We use a new class of monotonic GNNs (MGNNs) to ensure that this process is equivalent to a round of application of a set of Datalog rules. We also show that, given an arbitrary MGNN, we can automatically extract rules that completely characterise the transformation. We evaluate our approach by applying it to classification tasks in knowledge graph completion.
KR Conference 2022 Conference Paper
Rule learning involves developing machine learning models that can be applied to a set of logical facts to predict additional facts, as well as providing methods for extracting from the learned model a set of logical rules that explain symbolically the model's predictions. Existing such approaches, however, do not describe formally the relationship between the model's predictions and the derivations of the extracted rules; rather, it is often claimed without justification that the extracted rules `approximate' or `explain' the model, and rule quality is evaluated by manual inspection. In this paper, we study the formal properties of Neural-LP--a prominent rule learning approach. We show that the rules extracted from Neural-LP models can be both unsound and incomplete: on the same input dataset, the extracted rules can derive facts not predicted by the model, and the model can make predictions not derived by the extracted rules. We also propose a modification to the Neural-LP model that ensures that the extracted rules are always sound and complete. Finally, we show that, on several prominent benchmarks, the classification performance of our modified model is comparable to that of the standard Neural-LP model. Thus, faithful learning of rules is feasible from both a theoretical and practical point of view.
AIJ Journal 2022 Journal Article
AIJ Journal 2019 Journal Article
AAAI Conference 2019 Conference Paper
The seminaı̈ve algorithm can be used to materialise all consequences of a datalog program, and it also forms the basis for algorithms that incrementally update a materialisation as the input facts change. Certain (combinations of) rules, however, can be handled much more efficiently using custom algorithms. To integrate such algorithms into a general reasoning approach that can handle arbitrary rules, we propose a modular framework for computing and maintaining a materialisation. We split a datalog program into modules that can be handled using specialised algorithms, and we handle the remaining rules using the seminaı̈ve algorithm. We also present two algorithms for computing the transitive and the symmetric– transitive closure of a relation that can be used within our framework. Finally, we show empirically that our framework can handle arbitrary datalog programs while outperforming existing approaches, often by orders of magnitude.
JAIR Journal 2018 Journal Article
Classification of description logic (DL) ontologies is a key computational problem in modern data management applications, so considerable effort has been devoted to the development and optimisation of practical reasoning calculi. Consequence-based calculi combine ideas from hypertableau and resolution in a way that has proved very effective in practice. However, existing consequence-based calculi can handle either Horn DLs (which do not support disjunction) or DLs without number restrictions. In this paper, we overcome this important limitation and present the first consequence-based calculus for deciding concept subsumption in the DL ALCHIQ+. Our calculus runs in exponential time assuming unary coding of numbers, and on ELH ontologies it runs in polynomial time. The extension to disjunctions and number restrictions is technically involved: we capture the relevant consequences using first-order clauses, and our inference rules adapt paramodulation techniques from first-order theorem proving. By using a well-known preprocessing step, the calculus can also decide concept subsumptions in SRIQ---a rich DL that covers all features of OWL 2 DL apart from nominals and datatypes. We have implemented our calculus in a new reasoner called Sequoia. We present the architecture of our reasoner and discuss several novel and important implementation techniques such as clause indexing and redundancy elimination. Finally, we present the results of an extensive performance evaluation, which revealed Sequoia to be competitive with existing reasoners. Thus, the calculus and the techniques we present in this paper provide an important addition to the repertoire of practical implementation techniques for description logic reasoning.
AAAI Conference 2018 Conference Paper
Inspired by the magic sets for Datalog, we present a novel goal-driven approach for answering queries over terminating existential rules with equality (aka TGDs and EGDs). Our technique improves the performance of query answering by pruning the consequences that are not relevant for the query. This is challenging in our setting because equalities can potentially affect all predicates in a dataset. We address this problem by combining the existing singularization technique with two new ingredients: an algorithm for identifying the rules relevant to a query and a new magic sets algorithm. We show empirically that our technique can significantly improve the performance of query answering, and that it can mean the difference between answering a query in a few seconds or not being able to process the query at all.
AAAI Conference 2018 Conference Paper
To efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate rules ‘backwards’ by matching their heads to a fact and evaluating the partially instantiated rule bodies as queries. We show that this can be a considerable source of overhead even on very small updates. In contrast, the Counting algorithm does not evaluate the rules ‘backwards’, but it can handle only nonrecursive rules. We present two hybrid approaches that combine DRed and B/F with Counting so as to reduce or even eliminate ‘backward’ rule evaluation while still handling arbitrary datalog programs. We show empirically that our hybrid algorithms are usually significantly faster than existing approaches, sometimes by orders of magnitude.
IJCAI Conference 2018 Conference Paper
There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of limit programs was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks.
AAAI Conference 2018 Conference Paper
In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Stream reasoning algorithms, however, must be able to stream out query answers as soon as possible, and can only keep a limited number of previous input facts in memory. In this paper, we propose novel reasoning problems to deal with these challenges, and study their computational properties on Datalog extended with a temporal sort and the successor function—a core rule-based language for stream reasoning applications.
IJCAI Conference 2017 Conference Paper
Motivated by applications in declarative data analysis, we study DatalogZ---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In limit DatalogZ predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailment is coNExpTime-complete in combined, and coNP-complete in data complexity. Moreover, an additional stability requirement causes the complexity to drop to ExpTime and PTime, respectively. Finally, we show that stable DatalogZ can express many useful data analysis tasks, and so our results provide a sound foundation for the development of advanced information systems.
KR Conference 2016 Conference Paper
Consequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus for SRIQ— a rich DL that supports both features. This extension is nontrivial since the intermediate consequences that need to be derived during reasoning cannot be captured using DLs themselves. The results of our preliminary performance evaluation suggest the feasibility of our approach in practice.
AAAI Conference 2015 Conference Paper
Answering conjunctive queries (CQs) over EL knowledge bases (KBs) with complex role inclusions is PSPACE-hard and in PSPACE in certain cases; however, if complex role inclusions are restricted to role transitivity, a tight upper complexity bound has so far been unknown. Furthermore, the existing algorithms cannot handle reflexive roles, and they are not practicable. Finally, the problem is tractable for acyclic CQs and ELH, and NP-complete for unrestricted CQs and ELHO KBs. In this paper we complete the complexity landscape of CQ answering for several important cases. In particular, we present a practicable NP algorithm for answering CQs over ELHOs KBs—a logic containing all of OWL 2 EL, but with complex role inclusions restricted to role transitivity. Our preliminary evaluation suggests that the algorithm can be suitable for practical use. Moreover, we show that, even for a restricted class of so-called arborescent acyclic queries, CQ answering over EL KBs becomes NP-hard in the presence of either transitive or reflexive roles. Finally, we show that answering arborescent CQs over ELHO KBs is tractable, whereas answering acyclic CQs is NP-hard.
IJCAI Conference 2015 Conference Paper
Materialisation precomputes all consequences of a set of facts and a datalog program so that queries can be evaluated directly (i. e. , independently from the program). Rewriting optimises materialisation for datalog programs with equality by replacing all equal constants with a single representative; and incremental maintenance algorithms can efficiently update a materialisation for small changes in the input facts. Both techniques are critical to practical applicability of datalog systems; however, we are unaware of an approach that combines rewriting and incremental maintenance. In this paper we present the first such combination, and we show empirically that it can speed up updates by several orders of magnitude compared to using either rewriting or incremental maintenance in isolation.
AAAI Conference 2015 Conference Paper
Rewriting is widely used to optimise owl: sameAs reasoning in materialisation based OWL 2 RL systems. We investigate issues related to both the correctness and efficiency of rewriting, and present an algorithm that guarantees correctness, improves efficiency, and can be effectively parallelised. Our evaluation shows that our approach can reduce reasoning times on practical data sets by orders of magnitude.
AAAI Conference 2015 Conference Paper
Datalog-based systems often materialise all consequences of a datalog program and the data, allowing users’ queries to be evaluated directly in the materialisation. This process, however, can be computationally intensive, so most systems update the materialisation incrementally when input data changes. We argue that existing solutions, such as the wellknown Delete/Rederive (DRed) algorithm, can be inefficient in cases when facts have many alternate derivations. As a possible remedy, we propose a novel Backward/Forward (B/F) algorithm that tries to reduce the amount of work by a combination of backward and forward chaining. In our evaluation, the B/F algorithm was several orders of magnitude more efficient than the DRed algorithm on some inputs, and it was never significantly less efficient.
AIJ Journal 2014 Journal Article
AAAI Conference 2014 Conference Paper
We present a novel approach to parallel materialisation (i. e. , fixpoint computation) of datalog programs in centralised, main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation shows that our approach parallelises computation very well: with 16 physical cores, materialisation can be up to 13. 9 times faster than with just one core.
IJCAI Conference 2013 Conference Paper
Rewriting-based approaches for answering queries over an OWL 2 DL ontology have so far been developed mainly for Horn fragments of OWL 2 DL. In this paper, we study the possibilities of answering queries over non-Horn ontologies using datalog rewritings. We prove that this is impossible in general even for very simple ontology languages, and even if PTIME = NP. Furthermore, we present a resolution-based procedure for SHI ontologies that, in case it terminates, produces a datalog rewriting of the ontology. We also show that our procedure necessarily terminates on DL-LiteH, + bool ontologies—an extension of OWL 2 QL with transitive roles and Boolean connectives.
AAAI Conference 2013 Conference Paper
So-called combined approaches answer a conjunctive query over a description logic ontology in three steps: first, they materialise certain consequences of the ontology and the data; second, they evaluate the query over the data; and third, they filter the result of the second phase to eliminate unsound answers. Such approaches were developed for various members of the DL-Lite and the EL families of languages, but none of them can handle ontologies containing nominals. In our work, we bridge this gap and present a combined query answering approach for ELHOr ⊥ —a logic that contains all features of the OWL 2 EL standard apart from transitive roles and complex role inclusions. This extension is nontrivial because nominals require equality reasoning, which introduces complexity into the first and the third step. Our empirical evaluation suggests that our technique is suitable for practical application, and so it provides a practical basis for conjunctive query answering in a large fragment of OWL 2 EL.
KR Conference 2012 Conference Paper
problem in both database and KR settings. This problem is undecidable (Beeri and Vardi 1981) in general, and it can be characterised using chase (Johnson and Klug 1984; Maier, Mendelzon, and Sagiv 1979), a technique closely related to the hypertableau calculus (Motik, Shearer, and Horrocks 2009). The chase extends in a forward-chaining manner the original set of facts by introducing facts implied by the rules. The result of the chase is called the universal model, and an arbitrary conjunctive query can be answered over the original set of facts and the rules by simply evaluating the query in the universal model. Rules with existentially quantified variables in the head— so-called generating rules—require the introduction of fresh individuals, and cyclic applications of generating rules may lead to non-termination; moreover, determining whether chase terminates on a set of rules and facts is undecidable. However, several decidable classes of existential rules have been identified, and the existing proposals can be classified into two main groups. In the first group, rules are restricted such that their (possibly infinite) universal models can be represented using finitary means. This group includes rules with universal models of bounded treewidth (Baget et al. 2011), guarded rules (Calı̀ et al. 2010), and ‘sticky’ rules (Calı̀, Gottlob, and Pieris 2011). In the second group, one uses a sufficient (but not necessary) acyclicity condition that ensures chase termination. Roughly speaking, acyclicity conditions analyse information flow between the rules to ensure that no cyclic applications of generating rules are possible. Weak acyclicity (WA) (Fagin et al. 2005) was one of the first such notions, and it was extended to safety (SF) (Meier, Schmidt, and Lausen 2009), stratification (ST) (Deutsch, Nash, and Remmel 2008), acyclicity of a graph of rule dependencies (aGRD) (Baget, Mugnier, and Thomazo 2011), joint acyclicity (JA) (Krötzsch and Rudolph 2011), and super-weak acyclicity (SWA) (Marnette 2009). Acyclicity conditions are relevant for at least two reasons. First, unlike guarded rules, acyclic rules can axiomatise structures of arbitrary shapes, as long as these structures are bounded in size. Second, the chase result for acyclic rules can be stored and manipulated as if it were a database. This is important in data exchange, where the goal is to materialise the transformed database. In this paper, we argue that acyclicity is also relevant for description logics (DLs), the KR formalisms underpin- Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a key problem in knowledge representation and databases. This problem can be solved using the chase (aka materialisation) algorithm; however, CQ answering is undecidable for general existential rules, so the chase is not guaranteed to terminate. Several acyclicity conditions provide sufficient conditions for chase termination. In this paper, we present two novel such conditions—modelfaithful acyclicity (MFA) and model-summarising acyclicity (MSA)—that generalise many of the acyclicity conditions known so far in the literature. Materialisation provides the basis for several widely-used OWL 2 DL reasoners. In order to avoid termination problems, many of these systems handle only the OWL 2 RL profile of OWL 2 DL; furthermore, some systems go beyond OWL 2 RL, but they provide no termination guarantees. In this paper we investigate whether various acyclicity conditions can provide a principled and practical solution to these problems. On the theoretical side, we show that query answering for acyclic ontologies is of lower complexity than for general ontologies. On the practical side, we show that many of the commonly used OWL 2 DL ontologies are MSA, and that the facts obtained via materialisation are not too large. Thus, our results suggest that principled extensions to materialisationbased OWL 2 DL reasoners may be practically feasible.
LPAR Conference 2012 Conference Paper
Abstract An important goal of research in description logics (DLs) and related logic-based KR formalisms is to identify the worst-case complexity of reasoning. Such results, however, measure the complexity of a logic as a whole. For example, reasoning in the basic DL \(\mathcal{ALCI}\) is E xp T ime -complete, which means that \(\mathcal{ALCI}\) constructors can be used in a way so that exponential time is strictly required for solving a reasoning problem. It is, however, well known that, given two \(\mathcal{ALCI}\) knowledge bases of roughly the same size, reasoning with one knowledge base may be much more difficult than with the other, depending on the interaction of the axioms in the KBs. Thus, existing worst-case complexity results provide only a very coarse measure of reasoning complexity, and they do not tell us much about the “hardness” of each individual knowledge base.
KR Conference 2010 Conference Paper
There is currently a growing interest in techniques for hiding parts of the signature of an ontology Kh that is being reused by another ontology Kv. Towards this goal, Cuenca Grau, Motik, and Kazakov (2009) recently proposed the import-byquery framework, which makes the content of Kh accessible through a limited query interface. If Kv reuses the symbols from Kh in a certain restricted way, one can reason over Kv ∪ Kh by accessing only Kv and the query interface. In this paper, we map out the landscape of the import-by-query problem. We show that certain restrictions of our original framework are strictly necessary to make reasoning possible, we propose extensions that overcome some of the expressivity limitations, we present several novel reasoning algorithms, and we outline the limitations of the new framework.
IJCAI Conference 2009 Conference Paper
To enable ontology reuse, the Web Ontology Language (OWL) allows an ontology Kv to import an ontology Kh. To reason with such a Kv, a reasoner needs physical access to the axioms of Kh. For copyright and/or privacy reasons, however, the authors of Kh might not want to publish the axioms of Kh; instead, they might prefer to provide an oracle that can answer a (limited) set of queries over Kh, thus allowing Kv to import Kh “by query. ” In this paper, we study import-by-query algorithms, which can answer questions about Kv ∪ Kh by accessing only Kv and the oracle. We show that no such algorithm exists in general, and present restrictions under which importing by query becomes feasible.
AIJ Journal 2009 Journal Article
IS Journal 2008 Journal Article
The recipients of the 2008 IEEE Intelligent Systems 10 to Watch award—Philipp Cimiano, Dmitri Dolgov, Anat Levin, Peter Mika, Brian Milch, Louis-Philippe Morency, Boris Motik, Jennifer Neville, Erik Sudderth, and Luis von Ahn—discuss their current research and their visions of AI for the future.
AAAI Conference 2008 Conference Paper
Applications of Semantic Web technologies often require the management of metalevel information—that is, information that provides additional detail about domain-level information, such as provenance or access rights policies. Existing OWL-based tools provide little or no support for the representation and management of metalevel information. To fill this gap, we propose a framework based on metaviews— ontologies that describe facts in the application domain. We have implemented our framework in the KAON2 reasoner, and have successfully applied it in a nontrivial scenario.
KR Conference 2008 Conference Paper
State-of-the-art ontology languages are often not sufficiently expressive to accurately represent domains consisting of objects connected in a complex way. As a possible remedy, in our previous work we have proposed an extension of ontology languages with description graphs. In this paper, we extend this formalism by allowing for multiple graphs that can be combined in complex ways, thus obtaining a powerful language for modeling structured objects. By imposing a particular acyclicity restriction on the relationships between the graphs, we ensure that checking satisfiability of knowledge bases expressed in our language is decidable. We also present a practical reasoning algorithm.
IJCAI Conference 2007 Conference Paper
Integrating description logics (DL) and logic programming (LP) would produce a very powerful and useful formalism. However, DLs and LP are based on quite different principles, so achieving a seamless integration is not trivial. In this paper, we introduce hybrid MKNF knowledge bases that faithfully integrate DLs with LP using the logic of Minimal Knowledge and Negation as Failure (MKNF) [Lifschitz, 1991]. We also give reasoning algorithms and tight data complexity bounds for several interesting fragments of our logic.
LPAR Conference 2006 Conference Paper
Abstract Many modern applications of description logics (DLs) require answering queries over large data quantities, structured according to relatively simple ontologies. For such applications, we conjectured that reusing ideas of deductive databases might improve scalability of DL systems. Hence, in our previous work, we developed an algorithm for reducing a DL knowledge base to a disjunctive datalog program. To test our conjecture, we implemented our algorithm in a new DL reasoner KAON2, which we describe in this paper. Furthermore, we created a comprehensive test suite and used it to conduct a performance evaluation. Our results show that, on knowledge bases with large ABoxes but with simple TBoxes, our technique indeed shows good performance; in contrast, on knowledge bases with large and complex TBoxes, existing techniques still perform better. This allowed us to gain important insights into strengths and weaknesses of both approaches.
IJCAI Conference 2005 Conference Paper
LPAR Conference 2004 Conference Paper
Abstract Resolution-based calculi are among the most widely used calculi for theorem proving in first-order logic. Numerous refinements of resolution are nowadays available, such as e. g. basic superposition, a calculus highly optimized for theorem proving with equality. However, even such an advanced calculus does not restrict inferences enough to obtain decision procedures for complex logics, such as \(\mathcal{SHIQ}\). In this paper, we present a new decomposition inference rule, which can be combined with any resolution-based calculus compatible with the standard notion of redundancy. We combine decomposition with basic superposition to obtain three new decision procedures: ( i ) for the description logic \(\mathcal{SHIQ}\), ( ii ) for the description logic \(\mathcal{ALCHIQ}b\), and ( iii ) for answering conjunctive queries over \(\mathcal{SHIQ}\) knowledge bases. The first two procedures are worst-case optimal and, based on the vast experience in building efficient theorem provers, we expect them to be suitable for practical usage.
ECAI Conference 2004 Conference Paper
KR Conference 2004 Conference Paper
As applications of description logics proliferate, efficient reasoning with large ABoxes (sets of individuals with descriptions) becomes ever more important. Motivated by the prospects of reusing optimization techniques from deductive databases, in this paper, we present a novel approach to checking consistency of ABoxes, instance checking and query answering, w. r. t. ontologies formulated using a slight restriction of the description logic SHIQ. Our approach proceeds in three steps: (i) the ontology is translated into first-order clauses, (ii) TBox and RBox clauses are saturated using a resolution-based decision procedure, and (iii) the saturated set of clauses is translated into a disjunctive datalog program. Thus, query answering can be performed using the resulting program, while applying all existing optimization techniques, such as join-order optimizations or magic sets. Equally important, the resolution-based decision procedure we present is for unary coding of numbers worst-case optimal, i. e. it runs in ExpTime.