Arrow Research search

Author name cluster

Antonio Vergari

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

UAI Conference 2025 Conference Paper

A Probabilistic Neuro-symbolic Layer for Algebraic Constraint Satisfaction

  • Leander Kurscheidt
  • Paolo Morettin
  • Roberto Sebastiani
  • Andrea Passerini
  • Antonio Vergari

In safety-critical applications, guaranteeing the satisfaction of constraints over continuous environments is crucial, e. g. , an autonomous agent should never crash over obstacles or go off-road. Neural models struggle in the presence of these constraints, especially when they involve intricate algebraic relationships. To address this, we introduce a differentiable probabilistic layer that guarantees the satisfaction of non-convex algebraic constraints over continuous variables. This probabilistic algebraic layer (PAL) can be seamlessly plugged into any neural architecture and trained via maximum likelihood without requiring approximations. PAL defines a distribution over conjunctions and disjunctions of linear inequalities, parametrized by polynomials. This formulation enables efficient and exact renormalization via symbolic integration, which can be amortized across different data points and easily parallelized on a GPU. We showcase PAL and our integration scheme on a number of benchmarks for algebraic constraint integration and on real-world trajectory data.

ICML Conference 2025 Conference Paper

Is Complex Query Answering Really Complex?

  • Cosimo Gregucci
  • Bo Xiong 0001
  • Daniel Hernández 0002
  • Lorenzo Loconte
  • Pasquale Minervini
  • Steffen Staab
  • Antonio Vergari

Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks most queries (up to 98% for some query types) can be reduced to simpler problems, e. g. , link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreses significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

ICLR Conference 2025 Conference Paper

Logically Consistent Language Models via Neuro-Symbolic Integration

  • Diego Calanzone
  • Stefano Teso
  • Antonio Vergari

Current large language models (LLMs) are far from reliable: they are prone to generate non-factual information and, more crucially, to contradict themselves when prompted to reason about relations between real entities of the world. These problems are currently addressed with large scale fine-tuning or by delegating consistent reasoning to external tools. In this work, we strive for a middle ground and leverage a training objective based on a principled neuro-symbolic loss that teaches a LLM to be consistent with external knowledge in the form of a set of facts and rules. Fine-tuning with such a loss on a limited set of facts enables our LLMs to be more logically consistent than previous baselines for a given constraint. Our approach also allows to easily combine multiple logical constraints at once in a principled way, delivering LLMs that are more consistent w.r.t. all the selected rules. Moreover, our method allows LLMs to extrapolate to unseen but semantically similar factual knowledge, represented in unseen datasets, more systematically.

NeurIPS Conference 2025 Conference Paper

Neurosymbolic Diffusion Models

  • Emile van Krieken
  • Pasquale Minervini
  • Edoardo Maria Ponti
  • Antonio Vergari

Neurosymbolic (NeSy) predictors combine neural perception with symbolic reasoning to solve tasks like visual reasoning. However, standard NeSy predictors assume conditional independence between the symbols they extract, thus limiting their ability to model interactions and uncertainty --- often leading to overconfident predictions and poor out-of-distribution generalisation. To overcome the limitations of the independence assumption, we introduce neurosymbolic diffusion models (NeSyDMs), a new class of NeSy predictors that use discrete diffusion to model dependencies between symbols. Our approach reuses the independence assumption from NeSy predictors at each step of the diffusion process, enabling scalable learning while capturing symbol dependencies and uncertainty quantification. Across both synthetic and real-world benchmarks — including high-dimensional visual path planning and rule-based autonomous driving — NeSyDMs achieve state-of-the-art accuracy among NeSy predictors and demonstrate strong calibration.

NeSy Conference 2025 Conference Paper

Neurosymbolic Reasoning Shortcuts under the Independence Assumption

  • Emile van Krieken
  • Pasquale Minervini
  • Edoardo M. Ponti
  • Antonio Vergari

The ubiquitous independence assumption among symbolic concepts in neurosymbolic (NeSy) predictors is a convenient simplification: NeSy predictors use it to speed up probabilistic reasoning. Recent works like van Krieken et al. (2024) and Marconato et al. (2024) argued that the independence assumption can hinder learning of NeSy predictors and, more crucially, prevent them from correctly modelling uncertainty. There is, however, scepticism in the NeSy community around the scenarios in which the independence assumption actually limits NeSy systems (Faronius and Dos Martires, 2025). In this work, we settle this question by formally showing that assuming independence among symbolic concepts entails that a model can never represent uncertainty over certain concept combinations. Thus, the model fails to be aware of _reasoning shortcuts_, i. e. , the pathological behaviour of NeSy predictors that predict correct downstream tasks but for the wrong reasons.

AAAI Conference 2025 Conference Paper

Sum of Squares Circuits

  • Lorenzo Loconte
  • Stefan Mengel
  • Antonio Vergari

Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs – sum of squares PCs – that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.

TMLR Journal 2025 Journal Article

Tractable Representation Learning with Probabilistic Circuits

  • Steven Braun
  • Sahil Sidheekh
  • Antonio Vergari
  • Martin Mundt
  • Sriraam Natarajan
  • Kristian Kersting

Probabilistic circuits (PCs) are powerful probabilistic models that enable exact and tractable inference, making them highly suitable for probabilistic reasoning and inference tasks. While dominant in neural networks, representation learning with PCs remains underexplored, with prior approaches relying on external neural embeddings or activation-based encodings. To address this gap, we introduce autoencoding probabilistic circuits (APCs), a novel framework leveraging the tractability of PCs to model probabilistic embeddings explicitly. APCs extend PCs by jointly modeling data and embeddings, obtaining embedding representations through tractable probabilistic inference. The PC encoder allows the framework to natively handle arbitrary missing data and is seamlessly integrated with a neural decoder in a hybrid, end-to-end trainable architecture enabled by differentiable sampling. Our empirical evaluation demonstrates that APCs outperform existing PC-based autoencoding methods in reconstruction quality, generate embeddings competitive with, and exhibit superior robustness in handling missing data compared to neural autoencoders. These results highlight APCs as a powerful and flexible representation learning method that exploits the probabilistic inference capabilities of PCs, showing promising directions for robust inference, out-of-distribution detection, and knowledge distillation.

TMLR Journal 2025 Journal Article

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

  • Lorenzo Loconte
  • Antonio Mari
  • Gennaro Gala
  • Robert Peharz
  • Cassio de Campos
  • Erik Quaeghebeur
  • Gennaro Vessio
  • Antonio Vergari

This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular “Lego block” approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.

NeurIPS Conference 2024 Conference Paper

A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

  • Samuele Bortolotti
  • Emanuele Marconato
  • Tommaso Carraro
  • Paolo Morettin
  • Emile van Krieken
  • Antonio Vergari
  • Stefano Teso
  • Andrea Passerini

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https: //unitn-sml. github. io/rsbench.

UAI Conference 2024 Conference Paper

BEARS Make Neuro-Symbolic Models Aware of their Reasoning Shortcuts

  • Emanuele Marconato
  • Samuele Bortolotti
  • Emile van Krieken
  • Antonio Vergari
  • Andrea Passerini
  • Stefano Teso

Neuro-Symbolic (NeSy) predictors that conform to symbolic knowledge {–} encoding, e. g. , safety constraints {–} can be affected by Reasoning Shortcuts (RSs): They learn concepts consistent with the symbolic knowledge by exploiting unintended semantics. RSs compromise reliability and generalization and, as we show in this paper, they are linked to NeSy models being overconfident about the predicted concepts. Unfortunately, the only trustworthy mitigation strategy requires collecting costly dense supervision over the concepts. Rather than attempting to avoid RSs altogether, we propose to ensure NeSy models are aware of the semantic ambiguity of the concepts they learn, thus enabling their users to identify and distrust low-quality concepts. Starting from three simple desiderata, we derive bears (BE Aware of Reasoning Shortcuts), an ensembling technique that calibrates the model’s concept-level confidence without compromising prediction accuracy, thus encouraging NeSy architectures to be uncertain about concepts affected by RSs. We show empirically that bears improves RS-awareness of several state-of-the-art NeSy models, and also facilitates acquiring informative dense annotations for mitigation purposes.

ICML Conference 2024 Conference Paper

On the Independence Assumption in Neurosymbolic Learning

  • Emile van Krieken
  • Pasquale Minervini
  • Edoardo M. Ponti
  • Antonio Vergari

State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that the minima of such loss functions are usually highly disconnected and non-convex, and thus difficult to optimise. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.

NeurIPS Conference 2024 Conference Paper

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

  • Gennaro Gala
  • Cassio de Campos
  • Antonio Vergari
  • Erik Quaeghebeur

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training. In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs.

ICLR Conference 2024 Conference Paper

Subtractive Mixture Models via Squaring: Representation and Learning

  • Lorenzo Loconte
  • Aleksanteri M. Sladek
  • Stefan Mengel
  • Martin Trapp 0001
  • Arno Solin
  • Nicolas Gillis
  • Antonio Vergari

Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.

AAAI Conference 2024 Conference Paper

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

  • Andreas Grivas
  • Antonio Vergari
  • Adam Lopez

Sigmoid output layers are widely used in multi-label classification (MLC) tasks, in which multiple labels can be assigned to any input. In many practical MLC tasks, the number of possible labels is in the thousands, often exceeding the number of input features and resulting in a low-rank output layer. In multi-class classification, it is known that such a low-rank output layer is a bottleneck that can result in unargmaxable classes: classes which cannot be predicted for any input. In this paper, we show that for MLC tasks, the analogous sigmoid bottleneck results in exponentially many unargmaxable label combinations. We explain how to detect these unargmaxable outputs and demonstrate their presence in three widely used MLC datasets. We then show that they can be prevented in practice by introducing a Discrete Fourier Transform (DFT) output layer, which guarantees that all sparse label combinations with up to k active labels are argmaxable. Our DFT layer trains faster and is more parameter efficient, matching the F1@k score of a sigmoid layer while using up to 50% fewer trainable parameters. Our code is publicly available at https://github.com/andreasgrv/sigmoid-bottleneck.

NeurIPS Conference 2023 Conference Paper

How to Turn Your Knowledge Graph Embeddings into Generative Models

  • Lorenzo Loconte
  • Nicola Di Mauro
  • Robert Peharz
  • Antonio Vergari

Some of the most successful knowledge graph embedding (KGE) models for link prediction – CP, RESCAL, TuckER, ComplEx – can be interpreted as energy-based models. Under this perspective they are not amenable for exact maximum-likelihood estimation (MLE), sampling and struggle to integrate logical constraints. This work re-interprets the score functions of these KGEs as circuits – constrained computational graphs allowing efficient marginalisation. Then, we design two recipes to obtain efficient generative circuit models by either restricting their activations to be non-negative or squaring their outputs. Our interpretation comes with little or no loss of performance for link prediction, while the circuits framework unlocks exact learning by MLE, efficient sampling of new triples, and guarantee that logical constraints are satisfied by design. Furthermore, our models scale more gracefully than the original KGEs on graphs with millions of entities.

NeurIPS Conference 2023 Conference Paper

Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts

  • Emanuele Marconato
  • Stefano Teso
  • Antonio Vergari
  • Andrea Passerini

Neuro-Symbolic (NeSy) predictive models hold the promise of improved compliance with given constraints, systematic generalization, and interpretability, as they allow to infer labels that are consistent with some prior knowledge by reasoning over high-level concepts extracted from sub-symbolic inputs. It was recently shown that NeSy predictors are affected by reasoning shortcuts: they can attain high accuracy but by leveraging concepts with \textit{unintended semantics}, thus coming short of their promised advantages. Yet, a systematic characterization of reasoning shortcuts and of potential mitigation strategies is missing. This work fills this gap by characterizing them as unintended optima of the learning objective and identifying four key conditions behind their occurrence. Based on this, we derive several natural mitigation strategies, and analyze their efficacy both theoretically and empirically. Our analysis shows reasoning shortcuts are difficult to deal with, casting doubts on the trustworthiness and interpretability of existing NeSy solutions.

NeSy Conference 2023 Conference Paper

Semantic Probabilistic Layers for Neuro-Symbolic Learning

  • Kareem Ahmed
  • Stefano Teso
  • Kai-Wei Chang 0001
  • Guy Van den Broeck
  • Antonio Vergari

In this extended abstract, we briefly outline Semantic Probabilistic Layers [1], a new layer that can be plugged into any neural network to guarantee its predictions are consistent with a set of predefined symbolic constraints while being amenable to end-to-end learning via maximum likelihood. SPLs can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic layers. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on an array of challenging structured-output prediction tasks.

NeurIPS Conference 2022 Conference Paper

Semantic Probabilistic Layers for Neuro-Symbolic Learning

  • Kareem Ahmed
  • Stefano Teso
  • Kai-Wei Chang
  • Guy Van den Broeck
  • Antonio Vergari

We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks such as hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.

NeurIPS Conference 2021 Conference Paper

A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference

  • Antonio Vergari
  • YooJung Choi
  • Anji Liu
  • Stefano Teso
  • Guy Van den Broeck

Circuit representations are becoming the lingua franca to express and reason about tractable generative and discriminative models. In this paper, we show how complex inference scenarios for these models that commonly arise in machine learning---from computing the expectations of decision tree ensembles to information-theoretic divergences of sum-product networks---can be represented in terms of tractable modular operations over circuits. Specifically, we characterize the tractability of simple transformations---sums, products, quotients, powers, logarithms, and exponentials---in terms of sufficient structural constraints of the circuits they operate on, and present novel hardness results for the cases in which these properties are not satisfied. Building on these operations, we derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.

AAAI Conference 2021 System Paper

Juice: A Julia Package for Logic and Probabilistic Circuits

  • Meihua Dang
  • Pasha Khosravi
  • Yitao Liang
  • Antonio Vergari
  • Guy Van den Broeck

JUICE is an open-source Julia package providing tools for logic and probabilistic reasoning and learning based on logic circuits (LCs) and probabilistic circuits (PCs). It provides a range of efficient algorithms for probabilistic inference queries, such as computing marginal probabilities (MAR), as well as many more advanced queries. Certain structural circuit properties are needed to achieve this tractability, which JUICE helps validate. Additionally, it supports several parameter and structure learning algorithms proposed in the recent literature. By leveraging parallelism (on both CPU and GPU), JUICE provides a fast implementation of circuit-based algorithms, which makes it suitable for tackling large-scale datasets and models.

UAI Conference 2021 Conference Paper

Tractable computation of expected kernels

  • Wenzhe Li
  • Zhe Zeng 0001
  • Antonio Vergari
  • Guy Van den Broeck

Computing the expectation of kernel functions is a ubiquitous task in machine learning, with applications from classical support vector machines to exploiting kernel embeddings of distributions in probabilistic modeling, statistical inference, causal discovery, and deep learning. In all these scenarios, we tend to resort to Monte Carlo estimates as expectations of kernels are intractable in general. In this work, we characterize the conditions under which we can compute expected kernels exactly and efficiently, by leveraging recent advances in probabilistic circuit representations. We first construct a circuit representation for kernels and propose an approach to such tractable computation. We then demonstrate possible advancements for kernel embedding frameworks by exploiting tractable expected kernels to derive new algorithms for two challenging scenarios: 1) reasoning under missing data with kernel support vector regressors; 2) devising a collapsed black-box importance sampling scheme. Finally, we empirically evaluate both algorithms and show that they outperform standard baselines on a variety of datasets.

ICML Conference 2020 Conference Paper

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

  • Robert Peharz
  • Steven Lang
  • Antonio Vergari
  • Karl Stelzner
  • Alejandro Molina 0001
  • Martin Trapp 0001
  • Guy Van den Broeck
  • Kristian Kersting

Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent “deep-learning-style” implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.

ICLR Conference 2020 Conference Paper

From Variational to Deterministic Autoencoders

  • Partha Ghosh
  • Mehdi S. M. Sajjadi
  • Antonio Vergari
  • Michael J. Black
  • Bernhard Schölkopf

Variational Autoencoders (VAEs) provide a theoretically-backed and popular framework for deep generative models. However, learning a VAE from data poses still unanswered theoretical questions and considerable practical challenges. In this work, we propose an alternative framework for generative modeling that is simpler, easier to train, and deterministic, yet has many of the advantages of the VAE. We observe that sampling a stochastic encoder in a Gaussian VAE can be interpreted as simply injecting noise into the input of a deterministic decoder. We investigate how substituting this kind of stochasticity, with other explicit and implicit regularization schemes, can lead to an equally smooth and meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism to sample new data points, we introduce an ex-post density estimation step that can be readily applied to the proposed framework as well as existing VAEs, improving their sample quality. We show, in a rigorous empirical study, that the proposed regularized deterministic autoencoders are able to generate samples that are comparable to, or better than, those of VAEs and more powerful alternatives when applied to images as well as to structured data such as molecules.

NeurIPS Conference 2020 Conference Paper

Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations

  • Zhe Zeng
  • Paolo Morettin
  • Fanqi Yan
  • Antonio Vergari
  • Guy Van den Broeck

Weighted model integration (WMI) is a framework to perform advanced probabilistic inference on hybrid domains, i. e. , on distributions over mixed continuous-discrete random variables and in presence of complex logical and arithmetic constraints. In this work, we advance the WMI framework on both the theoretical and algorithmic side. First, we exactly trace the boundaries of tractability for WMI inference by proving that to be amenable to exact and efficient inference a WMI problem has to posses a tree-shaped structure with logarithmic diameter. While this result deepens our theoretical understanding of WMI it hinders the practical applicability of exact WMI solvers to real-world problems. To overcome this, we propose the first approximate WMI solver that does not resort to sampling, but performs exact inference on one approximate models. Our solution performs message passing in a relaxed problem structure iteratively to recover certain lost dependencies and, as our experiments suggest, is competitive with other SOTA WMI solvers.

ICML Conference 2020 Conference Paper

Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing

  • Zhe Zeng 0001
  • Paolo Morettin
  • Fanqi Yan
  • Antonio Vergari
  • Guy Van den Broeck

Weighted model integration (WMI) is an appealing framework for probabilistic inference: it allows for expressing the complex dependencies in real-world problems, where variables are both continuous and discrete, via the language of Satisfiability Modulo Theories (SMT), as well as to compute probabilistic queries with complex logical and arithmetic constraints. Yet, existing WMI solvers are not ready to scale to these problems. They either ignore the intrinsic dependency structure of the problem entirely, or they are limited to overly restrictive structures. To narrow this gap, we derive a factorized WMI computation enabling us to devise a scalable WMI solver based on message passing, called MP-WMI. Namely, MP-WMI is the first WMI solver that can (i) perform exact inference on the full class of tree-structured WMI problems, and (ii) perform inter-query amortization, e. g. , to compute all marginal densities simultaneously. Experimental results show that our solver dramatically outperforms the existingWMI solvers on a large set of benchmarks.

AAAI Conference 2019 Conference Paper

Automatic Bayesian Density Analysis

  • Antonio Vergari
  • Alejandro Molina
  • Robert Peharz
  • Zoubin Ghahramani
  • Kristian Kersting
  • Isabel Valera

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.

NeurIPS Conference 2019 Conference Paper

On Tractable Computation of Expected Predictions

  • Pasha Khosravi
  • YooJung Choi
  • Yitao Liang
  • Antonio Vergari
  • Guy Van den Broeck

Computing expected predictions of discriminative models is a fundamental task in machine learning that appears in many interesting applications such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations for classification scenarios in which exact computations are intractable. Our framework to compute expected predictions allows for handling of missing data during prediction time in a principled and accurate way and enables reasoning about the behavior of discriminative models. We empirically show our algorithm to consistently outperform standard imputation techniques on a variety of datasets. Finally, we illustrate how our framework can be used for exploratory data analysis.

UAI Conference 2019 Conference Paper

Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning

  • Robert Peharz
  • Antonio Vergari
  • Karl Stelzner
  • Alejandro Molina 0001
  • Martin Trapp 0001
  • Xiaoting Shao
  • Kristian Kersting
  • Zoubin Ghahramani

Sum-product networks (SPNs) are expressive probabilistic models with a rich set of exact and efficient inference routines. However, in order to guarantee exact inference, they require specific structural constraints, which complicate learning SPNs from data. Thereby, most SPN structure learners proposed so far are tedious to tune, do not scale easily, and are not easily integrated with deep learning frameworks. In this paper, we follow a simple “deep learning” approach, by generating unspecialized random structures, scalable to millions of parameters, and subsequently applying GPU-based optimization. Somewhat surprisingly, our models often perform on par with state-of-the-art SPN structure learners and deep neural networks on a diverse range of generative and discriminative scenarios. At the same time, our models yield well-calibrated uncertainties, and stand out among most deep generative and discriminative models in being robust to missing features and being able to detect anomalies.

AAAI Conference 2018 Conference Paper

Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains

  • Alejandro Molina
  • Antonio Vergari
  • Nicola Di Mauro
  • Sriraam Natarajan
  • Floriana Esposito
  • Kristian Kersting

While all kinds of mixed data—from personal data, over panel and scientific data, to public and commercial data—are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difficult task easier, we propose the first trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leaf distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-Rényi Maximum Correlation Coef- ficient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any distribution and permits efficient learning and inference. Our experiments show that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.

AAAI Conference 2018 Conference Paper

Sum-Product Autoencoding: Encoding and Decoding Representations Using Sum-Product Networks

  • Antonio Vergari
  • Robert Peharz
  • Nicola Di Mauro
  • Alejandro Molina
  • Kristian Kersting
  • Floriana Esposito

Sum-Product Networks (SPNs) are a deep probabilistic architecture that up to now has been successfully employed for tractable inference. Here, we extend their scope towards unsupervised representation learning: we encode samples into continuous and categorical embeddings and show that they can also be decoded back into the original input space by leveraging MPE inference. We characterize when this Sum- Product Autoencoding (SPAE) leads to equivalent reconstructions and extend it towards dealing with missing embedding information. Our experimental results on several multilabel classification problems demonstrate that SPAE is competitive with state-of-the-art autoencoder architectures, even if the SPNs were never trained to reconstruct their inputs.