Author name cluster

Leopoldo Bertossi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

JMLR Journal 2023 Journal Article

On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

Marcelo Arenas
Pablo Barcelo
Leopoldo Bertossi
Mikael Monet

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential~ Shap-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, we prove a strong positive result stating that the Shap-score can be computed in polynomial time over deterministic and decomposable Boolean circuits under the so-called product distributions on entities. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). Our positive result extends even beyond binary classifiers, as it continues to hold if each feature is associated with a finite domain of possible values. We also establish the computational limits of the notion of Shap-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the Shap-score intractable (namely, $\#P$-hard). It also implies that computing Shap-scores is $\#P$-hard even over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing Shap-scores over such class. In stark contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists (under widely believed complexity assumptions) for the computation of Shap-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the Shap-score of $x$ in $\varphi$ is smaller than the Shap-score of $y$ in $\varphi$. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

AAAI Conference 2021 Conference Paper

The Tractability of SHAP-Score-Based Explanations for Classification over Deterministic and Decomposable Boolean Circuits

Marcelo Arenas
Pablo Barceló
Leopoldo Bertossi
Mikaël Monet

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAPscore, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

PDF Details

KR Conference 2012 Conference Paper

Declarative Entity Resolution via Matching Dependencies and Answer Set Programs

Zeinab Bahmani
Leopoldo Bertossi
Solmaz Kolahi
Laks Lakshmanan

Entity resolution (ER) is an important and common problem in data cleaning. It is about identifying and merging records in a database that represent the same real-world entity. Recently, matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. In this work, we present disjunctive answer set programs (with stable model semantics) that capture through their models the class of alternative clean instances obtained after an ER process based on MDs. With these programs, we can obtain clean answers to queries, i. e. those that are invariant under the clean instances, by skeptically reasoning from the program. We investigate the ER programs in terms of expressive power for the ER task at hand. As an important special and practical case of ER, we provide a declarative reconstruction of the so-called union-case ER methodology, as presented through a generic approach to ER (the so-called Swoosh approach). 1 R(D) t1 t2 t3 t4 A a1 a1 a2 a3 B b1 b2 b3 b4 R(D0) t1 t2 t3 t4 A a1 a1 a2 a3 B b1 b1 b5 b5 D does not satisfy the MD, and is a dirty instance. After applying the MD, we could get the instance D0 on the right-hand side (RHS), where values for B have been identified. In principle, nothing prevents us from choosing a new value b5 from the domain to do the matching. D0 is stable in the sense that the MD holds in the traditional sense of an implication on D0. We call D0 a clean instance. In general, for a dirty instance and a set of MDs, multiple clean instances may exist. Notice that if we add the MD. R[B] ≈ R[B] → R[A] = R[A], creating a set of interacting MDs, a matching based on one MD may create new similarities that could enable a different MD in the set. A dynamic semantics for MDs was introduced in [Fan et al. 2009], that requires a pair of instances: a first one where the similarities hold and second one where the matchings are enforced, like D and D0 in Example 1. The MDs, as introduced in [Fan et al. 2009], do not specify how to match values. As we did in the example, we could even pick up a new value, e. g. b5 above, for the value in common. This semantics was refined and extended in [Bertossi, Kolahi and Lakshmanan 2011] (cf. also [Bertossi, Kolahi and Lakshmanan 2012]), using a matching function (MF) to guide the matchings, for each of the participating attribute domains. The MFs induce a lattice-theoretic structure on the latter [Bertossi, Kolahi and Lakshmanan 2011]. An alternative dynamic semantics was introduced in [Gardezi, Bertossi and Kiringa 2011]. It does not appeal to MFs, but matchings have to be mandatory (as also in [Bertossi, Kolahi and Lakshmanan 2011]) and minimal, i. e. a minimum number of

IJCAI Conference 2003 Conference Paper

Logic Programs for Consistently Querying Data Integration Systems

Loreto Bravo
Leopoldo Bertossi

We solve the problem of obtaining answers to queries posed to a mediated integration system un der the local-as-view paradigm that are consistent wrt to certain global integrity constraints. For this, the query program is combined with logic program ming specifications under the stable model seman tics of the class of minimal global instances, and of the class of their repairs.

PDF

TCS Journal 2003 Journal Article

Scalar aggregation in inconsistent databases

Marcelo Arenas
Leopoldo Bertossi
Jan Chomicki
Xin He
Vijay Raghavan
Jeremy Spinrad

We consider here scalar aggregation queries in databases that may violate a given set of functional dependencies. We define consistent answers to such queries to be greatest-lowest/least-upper bounds on the value of the scalar function across all (minimal) repairs of the database. We show how to compute such answers. We provide a complete characterization of the computational complexity of this problem. We also show how tractability can be improved in several special cases (one involves a novel application of Boyce–Codd Normal Form) and present a practical hybrid query evaluation method.

Details DOI