Arrow Research search

Author name cluster

Leonid Libkin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers
2 author rows

Possible papers

27

AIJ Journal 2022 Journal Article

Propositional and predicate logics of incomplete information

  • Marco Console
  • Paolo Guagliardo
  • Leonid Libkin

One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it.

KR Conference 2020 Conference Paper

Knowledge-Preserving Certain Answers for SQL-like Queries

  • Etienne Toussaint
  • Paolo Guagliardo
  • Leonid Libkin

Answering queries over incomplete data is based on finding answers that are certainly true, independently of how missing values are interpreted. This informal description has given rise to several different mathematical definitions of certainty. To unify them, a framework based on "explanations", or extra information about incomplete data, was recently proposed. It partly succeeded in justifying query answering methods for relational databases under set semantics, but had two major limitations. First, it was firmly tied to the set data model, and a fixed way of comparing incomplete databases with respect to their information content. These assumptions fail for real-life database queries in languages such as SQL that use bag semantics instead. Second, it was restricted to queries that only manipulate data, while in practice most analytical SQL queries invent new values, typically via arithmetic operations and aggregation. To leverage our understanding of the notion of certainty for queries in SQL-like languages, we consider incomplete databases whose information content may be enriched by additional knowledge. The knowledge order among them is derived from their semantics, rather than being fixed a priori. The resulting framework allows us to capture and justify existing notions of certainty, and extend these concepts to other data models and query languages. As natural applications, we provide for the first time a well-founded definition of certain answers for the relational bag data model and for value-inventing queries on incomplete databases, addressing the key shortcomings of previous approaches.

KR Conference 2020 Conference Paper

Reasoning about Measures of Unmeasurable Sets

  • Marco Console
  • Matthias Hofer
  • Leonid Libkin

In a variety of reasoning tasks, one estimates the likelihood of events by means of volumes of sets they define. Such sets need to be measurable, which is usually achieved by putting bounds, sometimes ad hoc, on them. We address the question how unbounded or unmeasurable sets can be measured nonetheless. Intuitively, we want to know how likely a randomly chosen point is to be in a given set, even in the absence of a uniform distribution over the entire space. To address this, we follow a recently proposed approach of taking intersection of a set with balls of increasing radius, and defining the measure by means of the asymptotic behavior of the proportion of such balls taken by the set. We show that this approach works for every set definable in first-order logic with the usual arithmetic over the reals (addition, multiplication, exponentiation, etc. ), and every uniform measure over the space, of which the usual Lebesgue measure (area, volume, etc. ) is an example. In fact we establish a correspondence between the good asymptotic behavior and the finiteness of the VC dimension of definable families of sets. Towards computing the measure thus defined, we show how to avoid the asymptotics and characterize it via a specific subset of the unit sphere. Using definability of this set, and known techniques for sampling from the unit sphere, we give two algorithms for estimating our measure of unbounded unmeasurable sets, with deterministic and probabilistic guarantees, the latter being more efficient. Finally we show that a discrete analog of this measure exists and is similarly well-behaved.

IJCAI Conference 2019 Conference Paper

Do We Need Many-valued Logics for Incomplete Information?

  • Marco Console
  • Paolo Guagliardo
  • Leonid Libkin

One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it.

IJCAI Conference 2019 Conference Paper

Measuring the Likelihood of Numerical Constraints

  • Marco Console
  • Matthias Hofer
  • Leonid Libkin

Our goal is to measure the likelihood of the satisfaction of numerical constraints in the absence of prior information. We study expressive constraints, involving arithmetic and complex numerical functions, and even quantification over numbers. Such problems arise in processing incomplete data, or analyzing conditions in programs without a priori bounds on variables. We show that for constraints on n variables, the proper way to define such a measure is as the limit of the part of the n-dimensional ball that consists of points satisfying the constraints, when the radius increases. We prove that the existence of such a limit is closely related to the notion of o-minimality from model theory. Thus, for constraints definable with the usual arithmetic and exponentiation, the likelihood is well defined, but adding trigonometric functions is problematic. We look at computing and approximating such likelihoods for order and linear constraints, and prove an impossibility result for approximating with multiplicative error. However, as the likelihood is a number between 0 and 1, an approximation scheme with additive error is acceptable, and we give it for arbitrary linear constraints.

KR Conference 2018 Conference Paper

Approximating Certainty in Querying Data and Metadata

  • Cristina Civili
  • Leonid Libkin

Metadata, such as mappings or constraints, is used in a variety of scenarios to facilitate query answering; these include data integration and exchange, consistent query answering, and ontology-based data access. A common feature of these scenarios is that data and metadata together produce multiple databases, and answers to queries must be certain, i. e. , true in all such databases. This usually incurs prohibitively high complexity outside very restricted classes of queries such as conjunctive queries and their unions. To overcome this, we propose to approximate such query answering by reducing it to another scenario where multiple databases need to be taken into account, namely incomplete information in databases. For them, well-behaved approximation schemes exist for much larger classes of queries. We give a generic representation of query answering via incomplete data, and show how it works in the scenarios listed above. We use the connection to show how to effectively approximate several intractable query answering problems, and discuss differences between applying this framework under open and closed world semantics.

IJCAI Conference 2018 Conference Paper

Explainable Certain Answers

  • Giovanni Amendola
  • Leonid Libkin

When a dataset is not fully specified and can represent many possible worlds, one commonly answers queries by computing certain answers to them. A natural way of defining certainty is to say that an answer is certain if it is consistent with query answers in all possible worlds, and is furthermore the most informative answer with this property. However, the existence and complexity of such answers is not yet well understood even for relational databases. Thus in applications one tends to use different notions, essentially the intersection of query answers in possible worlds. However, justification of such notions has long been questioned. This leads to two problems: are certain answers based on informativeness feasible in applications? and can a clean justification be provided for intersection-based notions? Our goal is to answer both. For the former, we show that such answers may not exist, or be very large, even in simple cases of querying incomplete data. For the latter, we add the concept of explanations to the notion of informativeness: it shows not only that one object is more informative than the other, but also says why this is so. This leads to a modified notion of certainty: explainable certain answers. We present a general framework for reasoning about them, and show that for open and closed world relational databases, they are precisely the common intersection-based notions of certainty.

KR Conference 2018 Conference Paper

Propositional and Predicate Logics of Incomplete Information

  • Marco Console
  • Paolo Guagliardo
  • Leonid Libkin

One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene’s logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene’s logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it.

IJCAI Conference 2017 Conference Paper

On Querying Incomplete Information in Databases under Bag Semantics

  • Marco Console
  • Paolo Guagliardo
  • Leonid Libkin

Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i. e. , true in every possible world represented by an incomplete database. For positive queries, expressed either in positive relational algebra or as unions of conjunctive queries, finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency.

KR Conference 2016 Conference Paper

Approximations and Refinements of Certain Answers via Many-Valued Logics

  • Marco Console
  • Paolo Guagliardo
  • Leonid Libkin

Computing certain answers is the preferred way of answering queries in scenarios involving incomplete data. This, however, is computationally expensive, so practical systems use efficient techniques based on a particular three-valued logic, even though this often leads to incorrect results. Our goal is to provide a general manyvalued framework for correctly approximating certain answers. We do so by defining the semantics of manyvalued answers and queries, following the principle that additional knowledge about the input must translate into additional knowledge about the output. This framework lets us compare query outputs and evaluation procedures in terms of their informativeness. For each manyvalued logic with a knowledge ordering on its truth values, one can build a syntactic evaluation procedure for all first-order queries, that correctly approximates certain answers; additional truth values are used to refine information about certain answers. For concrete examples, we show that a recently proposed approach fixing some of the inconsistencies of SQL query evaluation is an immediate consequence of our framework, and we further refine it by adding a fourth truth value. We show that no evaluation procedure based on Boolean logic delivers correctness guarantees. Finally, we study the relative power of evaluation procedures based on the informativeness of the answers they produce.

AIJ Journal 2016 Journal Article

Certain answers as objects and knowledge

  • Leonid Libkin

The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this “one-size-fits-all” definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. The idea of the framework is to move away from the standard, in the database literature, assumption that query results be given in the form of a database object, and to allow instead two alternative representations of answers: as objects defining all other answers, or as knowledge we can deduce with certainty about all such answers. We show that the latter is often easier to achieve than the former, that in general certain answers need not be defined as intersection, and may well contain missing values in them. We also show that with a proper choice of semantics, we can often reduce computing certain answers – as either objects or knowledge – to standard query evaluation. We describe the framework in the most general way, applicable to a variety of data models, and test it on three concrete relational semantics of incompleteness: open, closed, and weak closed world.

IJCAI Conference 2015 Conference Paper

How to Define Certain Answers

  • Leonid Libkin

The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this “one-size-fitsall” definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. We combine three previously used approaches, based on the semantics and representation systems, on ordering incomplete databases in terms of their informativeness, and on viewing databases as knowledge expressed in a logical language, to come up with a well justified and principled notion of certain answers. Using it, we show that for queries satisfying some natural conditions (like not losing information if a more informative input is given), computing certain answers is surprisingly easy, and avoids the complexity issues that have been associated with the classical definition.

KR Conference 2014 Conference Paper

Certain Answers as Objects and Knowledge

  • Leonid Libkin

The central problem in the database field is of course query answering, and in the presence of incompleteness, one looks for certain answers: those that do not depend on the interpretation of unknown data. The concept was first mentioned in (Grant 1977) and formally defined 35 years ago in (Lipski 1979) as follows. Assume that the semantics [[D]] of an incomplete database D is given as the set of all complete databases D0 which D can represent. Then the certain answer to Q on D is defined as cert∩ (Q, D) = T {Q(D0) | D0 ∈ [[D]]}. Since database queries produce collections (sets, multisets, etc.), it makes sense to talk about their intersection. This definition has been universally applied to all the semantics of incompleteness, and in all the scenarios such as those listed above. The intuition is that this gives us the set of tuples independent of the interpretation of the missing information in D. Such a universal adoption of this basic definition has another consequence: certain answers themselves contain no missing information. In fact many algorithms for computing certain answers have, as the last step, elimination of any objects (say, rows in relational databases) with missing data. The question that we address here is the following: Is this standard “one-size-fits-all” definition really the right one to use for all the semantics, and all the applications? The answer, as we shall argue, is negative: the standard intersection semantics, as well as the assumption that no missing information is present in the answers, leads to many problems, and crucially to producing meaningless query answers. To argue that this is the case, and to explain some basic ideas behind the alternative approach we present here, note that in the database field, one tends to operate with objects (i. e., relations, XML documents, graph databases, etc.); in particular, queries take objects and return objects. Thus, the idea behind certain answers is to find an object A representing the set of objects Q([[D]]) = {Q(D0) | D0 ∈ [[D]]}. Such an object A must contain information common to all the objects in Q([[D]]): that is, it must be no more informative than any of the objects in Q([[D]]). Now take a simple example: we have a relation R = {(1, 2), (3, ⊥)} in a database, where ⊥ represents a null, or a missing value. The query Q returns R itself. Then cert∩ (Q, D) = {(1, 2)} under every reasonable semantics of incompleteness. But is it less informative than all of Q(D0) for D0 ∈ [[D]]? The answer depends on the semantics. Under The standard way of answering queries over incomplete databases is to compute certain answers, defined as the intersection of query answers on all complete databases that the incomplete database represents. But is this universally accepted definition correct? We argue that this “one-size-fits-all” definition can often lead to counterintuitive or just plain wrong results, and propose an alternative framework for defining certain answers. The idea of the framework is to move away from the standard, in the database literature, assumption that query results be given in the form of a database object, and to allow instead two alternative representations of answers: as objects defining all other answers, or as knowledge we can deduce with certainty about all such answers. We show that the latter is often easier to achieve than the former, that in general certain answers need not be defined as intersection, and may well contain missing information in them. We also show that with a proper choice of semantics, we can often reduce computing certain answers – as either objects or knowledge – to standard query evaluation. We describe the framework in the most general way, applicable to a variety of data models, and test it on three concrete relational semantics of incompleteness: open, closed, and weak closed world.

Highlights Conference 2013 Conference Abstract

Incomplete information, monotonicity, and homomorphism preservation

  • Amélie Gheerbrant
  • Leonid Libkin
  • Cristina Sirangelo

Incomplete information is ubiquitous in database applications, and yet our understanding of it is still rather basic. Since an incomplete database can represent multiple databases, query answering boils down to solving an entailment problem, while database systems are good at performing model-checking algorithms on first-order structures. Thus, to take advantage of them, we need to understand when entailment can be reduced to model-checking. We show that this happens for queries which are monotone with respect to the semantics of incompleteness, and that such monotonicity can be described via homomorphism preservation, for different notions of homomorphisms. Combining the latter with preservation theorems from logic (existing as well as new), we obtain classes of queries for which standard database systems can correctly compute answers over incomplete databases, or, logically speaking, for which entailment can be solved by simple model-checking.

TCS Journal 2013 Journal Article

Parameterized regular expressions and their languages

  • Pablo Barceló
  • Juan Reutter
  • Leonid Libkin

We study regular expressions that use variables, or parameters, which are interpreted as alphabet letters. We consider two classes of languages denoted by such expressions: under the possibility semantics, a word belongs to the language if it is denoted by some regular expression obtained by replacing variables with letters; under the certainty semantics, the word must be denoted by every such expression. Such languages are regular, and we show that they naturally arise in several applications such as querying graph databases and program analysis. As the main contribution of the paper, we provide a complete characterization of the complexity of the main computational problems related to such languages: nonemptiness, universality, containment, membership, as well as the problem of constructing NFAs capturing such languages. We also look at the extension when domains of variables could be arbitrary regular languages, and show that under the certainty semantics, languages remain regular and the complexity of the main computational problems does not change.

LPAR Conference 2012 Conference Paper

Regular Expressions for Data Words

  • Leonid Libkin
  • Domagoj Vrgoc

Abstract In data words, each position carries not only a letter form a finite alphabet, as the usual words do, but also a data value coming from an infinite domain. There has been a renewed interest in them due to applications in querying and reasoning about data models with complex structural properties, notably XML, and more recently, graph databases. Logical formalisms designed for querying such data often require concise and easily understandable presentations of regular languages over data words. Our goal, therefore, is to define and study regular expressions for data words. As the automaton model, we take register automata, which are a natural analog of NFAs for data words. We first equip standard regular expressions with limited memory, and show that they capture the class of data words defined by register automata. The complexity of the main decision problems for these expressions (nonemptiness, membership) also turns out to be the same as for register automata. We then look at a subclass of these regular expressions that can define many properties of interest in applications of data words, and show that the main decision problems can be solved efficiently for it.

LPAR Conference 2008 Conference Paper

Reasoning about XML with Temporal Logics and Automata

  • Leonid Libkin
  • Cristina Sirangelo

Abstract We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-Buchi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views.

LPAR Conference 2008 Conference Paper

Recurrent Reachability Analysis in Regular Model Checking

  • Anthony W. Lin
  • Leonid Libkin

Abstract We consider the problem of recurrent reachability over infinite systems given by regular relations on words and trees, i. e, whether a given regular set of states can be reached infinitely often from a given initial state in the given transition system. Under the condition that the transitive closure of the transition relation is regular, we show that the problem is decidable, and the set of all initial states satisfying the property is regular. Moreover, our algorithm constructs an automaton for this set in polynomial time, assuming that a transducer of the transitive closure can be computed in poly-time. We then demonstrate that transition systems generated by pushdown systems, regular ground tree rewrite systems, and the well-known process algebra PA satisfy our condition and transducers for their transitive closures can be computed in poly-time. Our result also implies that model checking EF-logic extended by recurrent reachability predicate (EGF) over such systems is decidable.

CSL Conference 2004 Conference Paper

Game-Based Notions of Locality Over Finite Models

  • Marcelo Arenas
  • Pablo Barceló
  • Leonid Libkin

Abstract Locality notions in logic say that the truth value of a formula can be determined locally, by looking at the isomorphism type of a small neighborhood of its free variables. Such notions have proved to be useful in many applications. They all, however, refer to isomorphism of neighborhoods, which most local logics cannot test for. A more relaxed notion of locality says that the truth value of a formula is determined by what the logic itself can say about that small neighborhood. Or, since most logics are characterized by games, the truth value of a formula is determined by the type, with respect to a game, of that small neighborhood. Such game-based notions of locality can often be applied when traditional isomorphism-based locality cannot. Our goal is to study game-based notions of locality. We work with an abstract view of games that subsumes games for many logics. We look at three, progressively more complicated locality notions. The easiest requires only very mild conditions on the game and works for most logics of interest. The other notions, based on Hanf’s and Gaifman’s theorems, require more restrictions. We state those restrictions and give examples of logics that satisfy and fail the respective game-based notions of locality.

TCS Journal 2003 Journal Article

Expressive power of SQL

  • Leonid Libkin

It is a folk result in database theory that SQL cannot express recursive queries such as reachability; in fact, a new construct was added to SQL3 to overcome this limitation. However, the evidence for this claim is usually given in the form of a reference to a proof that relational algebra cannot express such queries. SQL, on the other hand, in all its implementations has three features that fundamentally distinguish it from relational algebra: namely, grouping, arithmetic operations, and aggregation. In the past few years, most questions about the additional power provided by these features have been answered. This paper surveys those results, and presents new simple and self-contained proofs of the main results on the expressive power of SQL. Somewhat surprisingly, tiny differences in the language definition affect the results in a dramatic way: under some very natural assumptions, it can be proved that SQL cannot define recursive queries, no matter what aggregate functions and arithmetic operations are allowed. But relaxing these assumptions just a tiny bit makes the problem of proving expressivity bounds for SQL as hard as some long-standing open problems in complexity theory.

TCS Journal 2002 Journal Article

Lower bounds for invariant queries in logics with counting

  • Leonid Libkin
  • Limsoon Wong

We study the expressive power of counting logics in the presence of auxiliary relations such as orders and preorders. The simplest such logic is the first-order logic with counting. This logic captures the complexity class TC 0 over ordered structures. We also consider first-order logic with arbitrary unary quantifiers and with infinitary extensions. We start by giving a simple direct proof that first-order logic with counting, in the presence of pre-orders that are almost-everywhere linear orders, cannot express the transitive closure of a binary relation. The proof is based on locality of formulae. We then show that the technique cannot be extended to linear orders. We further show that this result does not say anything about the power of invariant queries in first-order logic with counting vs. the class TC 0, in the presence of these preorders. In the second part of the paper, we prove a separation result showing that, for all the counting logics above, a linear order is more powerful than a preorder that is a linear order almost everywhere. In fact, we prove that the expressive power of invariant queries in the presence of such preorders can be characterized by a property normally associated with first-order definability over unordered structures. We do this by using locality techniques from finite-model theory. However, as some standard notions of locality fail in this setting, we have to modify them to prove the main result.

TCS Journal 2000 Journal Article

Local properties of query languages

  • Guozhu Dong
  • Leonid Libkin
  • Limsoon Wong

In this paper we study the expressiveness of local queries. By locality we mean — informally — that in order to check if a tuple belongs to the result of a query, one only has to look at a certain predetermined portion of the input. Examples include all relational calculus queries. We start by proving a general result describing outputs of local queries. This result leads to many easy inexpressibility proofs for local queries. We then consider a closely related property, namely, the bounded degree property. It describes the outputs of local queries on structures that locally look “simple. ” Every query that is local is shown to have the bounded degree property. Since every relational calculus (first-order) query is local, the general results proved for local queries can be viewed as “off-the-shelf” strategies for proving inexpressibility results, which are often easier to apply than Ehrenfeucht–Fraı̈ssé games. We also show that some generalizations of the bounded degree property that were conjectured to hold, fail for relational calculus. We then prove that the language obtained from relational calculus by adding grouping and aggregates, which is essentially plain SQL, has the bounded degree property, thus answering a question that has been open for several years. Consequently, first-order queries with Härtig or Rescher quantifiers also have the bounded degree property. Finally, we apply our results to incremental maintenance of views, and show that SQL and relational calculus are incapable of maintaining the transitive closure view even in the presence of auxiliary relations of moderate degree.

TCS Journal 1998 Journal Article

Models of approximation in databases

  • Leonid Libkin

Partial information in databases can arise when information from several databases is combined. Even if each database is complete for some “world”, the combined databases will not be, and answers to queries against such combined databases can only be approximated. In this paper we describe various situations in which a precise answer cannot be obtained for a query asked against multiple databases. Based on an analysis of these situations, we propose a classification of constructs that can be used to model approximations. The main goal of the paper is to study several formal models of approximations and their semantics. In particular, we obtain universality properties for these models of approximations. Universality properties suggest syntax for languages with approximations based on the operations which are naturally associated with them. We prove universality properties for most of the approximation constructs. Then we design languages built around datatypes given by the approximation constructs. A straightforward approach results in languages that have a number of limitations. In an attempt to overcome those limitations, we explain how all the languages can be embedded into a language for conjunctive and disjunctive sets from Libkin and Wong (1996) and demonstrate its usefulness in querying independent databases. We also discuss the semantics of approximation constructs and the relationship between them.