Author name cluster

Guy Van den Broeck

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

96 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Daniel Israel
Guy Van den Broeck
Aditya Grover

The generation speed of LLMs are bottlenecked by autoregressive decoding, where tokens are predicted sequentially one by one. Alternatively, diffusion large language models (dLLMs) theoretically allow for parallel token generation, but in practice struggle to achieve the speed of autoregressive models without significantly sacrificing quality. We therefore introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel. We achieve this by defining a multiplicative mixture between the dLLM marginal probabilities and the joint probability of sequences under a small auxiliary autoregressive model. This inverts the standard setup of speculative decoding, where the goal is to sample from a large autoregressive verifier by drafting from a smaller model. We further optimize APD by enabling KV caching and limiting the size of the masked input. Altogether, our method puts forward three tunable parameters to flexibly tradeoff throughput and quality. We show that APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.

ICLR Conference 2025 Conference Paper

Controllable Generation via Locally Constrained Resampling

Kareem Ahmed
Kai-Wei Chang 0001
Guy Van den Broeck

Autoregressive models have demonstrated an unprecedented ability at modeling the intricacies of natural language. However, they continue to struggle with generating complex outputs that adhere to logical constraints. Sampling from a fully-independent distribution subject to a constraint is hard. Sampling from an autoregressive distribution subject to a constraint is doubly hard: We have to contend not only with the hardness of the constraint but also the distribution's lack of structure. We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint. By factoring in information about the entire sequence, our approach offers better contextual awareness during constrained generation compared to current greedy approaches. Starting from a model sample, we induce a local, factorized distribution which we can tractably condition on the constraint. To generate samples that satisfy the constraint, we sample from the conditional distribution, correct for biases in the sample weights, and resample. The resulting samples closely approximate the target distribution and are guaranteed to satisfy the constraints. We evaluate our approach on several tasks, including LLM detoxification and solving Sudoku puzzles. We show that by disallowing a list of toxic expressions our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification. We also show that our approach achieves a perfect accuracy on Sudoku, compared to less than $50\%$ for GPT4-o and Gemini 1.5.

ICLR Conference 2025 Conference Paper

Discrete Copula Diffusion

Anji Liu
Oliver Broadrick
Mathias Niepert
Guy Van den Broeck

Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps -- they fail to capture dependencies between output variables at each denoising step. To address this issue, we provide a formal explanation and introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps. When we apply this approach to autoregressive copula models, the combined model outperforms both models individually in unconditional and conditional text generation. Specifically, the hybrid model achieves better (un)conditional text generation using 8 to 32 times fewer denoising steps than the diffusion model alone. In addition to presenting an effective discrete diffusion generation algorithm, this paper emphasizes the importance of modeling inter-variable dependencies in discrete diffusion.

ICLR Conference 2025 Conference Paper

Learning to Discretize Denoising Diffusion ODEs

Vinh Tong
Dung-Trung Hoang
Anji Liu
Guy Van den Broeck
Mathias Niepert

Diffusion Probabilistic Models (DPMs) are generative models showing competitive performance in various domains, including image synthesis and 3D point cloud generation. Sampling from pre-trained DPMs involves multiple neural function evaluations (NFEs) to transform Gaussian noise samples into images, resulting in higher computational costs compared to single-step generative models such as GANs or VAEs. Therefore, reducing the number of NFEs while preserving generation quality is crucial. To address this, we propose LD3, a lightweight framework designed to learn the optimal time discretization for sampling. LD3 can be combined with various samplers and consistently improves generation quality without having to retrain resource-intensive neural networks. We demonstrate analytically and empirically that LD3 improves sampling efficiency with much less computational overhead. We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training. LD3 offers an efficient approach to sampling from pre-trained diffusion models. Code is available at https://github.com/vinhsuhi/LD3.

TMLR Journal 2025 Journal Article

On the Challenges and Opportunities in Generative AI

Laura Manduchi
Clara Meister
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Däubener
Sophie Fellenz
Asja Fischer

The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains. In this work, our objective is to identify these issues and highlight key unresolved challenges in modern generative AI paradigms that should be addressed to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.

AAAI Conference 2025 Conference Paper

On the Relationship Between Monotone and Squared Probabilistic Circuits

Benjie Wang
Guy Van den Broeck

Probabilistic circuits are a unifying representation of functions as computation graphs of weighted sums and products. Their primary application is in probabilistic modeling, where circuits with non-negative weights (monotone circuits) can be used to represent and learn density/mass functions, with tractable marginal inference. Recently, it was proposed to instead represent densities as the square of the circuit function (squared circuits); this allows the use of negative weights while retaining tractability, and can be exponentially more expressive efficient than monotone circuits. Unfortunately, we show the reverse also holds, meaning that monotone circuits and squared circuits are incomparable in general. This raises the question of whether we can reconcile, and indeed improve upon the two modeling approaches. We answer in the positive by proposing Inception PCs, a novel type of circuit that naturally encompasses both monotone circuits and squared circuits as special cases, and employs complex parameters. Empirically, we validate that Inception PCs can outperform both monotone and squared circuits on a range of tabular and image datasets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Plug-and-Play Context Feature Reuse for Efficient Masked Generation

Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang

Masked generative models (MGMs) have emerged as a powerful framework for image synthesis, combining parallel decoding with strong bidirectional context modeling. However, generating high-quality samples typically requires many iterative decoding steps, resulting in high inference costs. A straightforward way to speed up generation is by decoding more tokens in each step, thereby reducing the total number of steps. However, when many tokens are decoded simultaneously, the model can only estimate the univariate marginal distributions independently, failing to capture the dependency among them. As a result, reducing the number of steps significantly compromises generation fidelity. In this work, we introduce ReCAP (Reused Context-Aware Prediction), a plug-and-play module that accelerates inference in MGMs by constructing low-cost steps via reusing feature embeddings from previously decoded context tokens. ReCAP interleaves standard full evaluations with lightweight steps that cache and reuse context features, substantially reducing computation while preserving the benefits of fine-grained, iterative generation. We demonstrate its effectiveness on top of three representative MGMs (MaskGIT, MAGE, and MAR), including both discrete and continuous token spaces and covering diverse architectural designs. In particular, on ImageNet256 class-conditional generation, ReCAP achieves up to 2. 4$\times$ faster inference than the base model with minimal performance drop, and consistently delivers better efficiency–fidelity trade-offs under various generation settings.

NeurIPS Conference 2025 Conference Paper

Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion

Vinh Tong
Trung-Dung Hoang
Anji Liu
Guy Van den Broeck
Mathias Niepert

In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model. Two main strategies have emerged for learning invariant distributions: designing equivariant network architectures and using data augmentation to approximate equivariance. While equivariant architectures preserve symmetry by design, they often involve greater complexity and pose optimization challenges. Data augmentation, on the other hand, offers flexibility but may fall short in fully capturing symmetries. Our framework enhances both approaches by reducing training variance and providing a provably lower-variance gradient estimator. We achieve this by interpreting data augmentation as a Monte Carlo estimator of the training gradient and applying Rao–Blackwellization. This leads to more stable optimization, faster convergence, and reduced variance, all while requiring only a single forward and backward pass per sample. We also present a practical implementation of this estimator—incorporating the loss and sampling procedure—through a method we call Orbit Diffusion. Theoretically, we guarantee that our loss admits equivariant minimizers. Empirically, Orbit Diffusion achieves state-of-the-art results on GEOM-QM9 for molecular conformation generation, improves crystal structure prediction, and advances text-guided crystal generation on the Perov-5 and MP-20 benchmarks. Additionally, it enhances protein designability in protein structure generation. Code is available at https: //github. com/vinhsuhi/Orbit-Diffusion. git.

ICML Conference 2025 Conference Paper

Scaling Probabilistic Circuits via Monarch Matrices

Honghua Zhang
Meihua Dang
Benjie Wang 0001
Stefano Ermon
Nanyun Peng 0001
Guy Van den Broeck

Probabilistic Circuits (PCs) are tractable representations of probability distributions allowing for exact and efficient computation of likelihoods and marginals. Recent advancements have improved the scalability of PCs either by leveraging their sparse properties or through the use of tensorized operations for better hardware utilization. However, no existing method fully exploits both aspects simultaneously. In this paper, we propose a novel sparse and structured parameterization for the sum blocks in PCs. By replacing dense matrices with sparse Monarch matrices, we significantly reduce the memory and computation costs, enabling unprecedented scaling of PCs. From a theory perspective, our construction arises naturally from circuit multiplication; from a practical perspective, compared to previous efforts on scaling up tractable probabilistic models, our approach not only achieves state-of-the-art generative modeling performance on challenging benchmarks like Text8, LM1B and ImageNet, but also demonstrates superior scaling behavior, achieving the same performance with substantially less compute as measured by the number of floating-point operations (FLOPs) during training.

ICML Conference 2025 Conference Paper

The Limits of Tractable Marginalization

Oliver Broadrick
Sanyam Agarwal
Guy Van den Broeck
Markus Bläser

Marginalization – summing a function over all assignments to a subset of its inputs – is a fundamental computational problem with applications from probabilistic inference to formal verification. Despite its computational hardness in general, there exist many classes of functions (e. g. , probabilistic models) for which marginalization remains tractable, and they can all be commonly expressed by arithmetic circuits computing multilinear polynomials. This raises the question, can all functions with polynomial time marginalization algorithms be succinctly expressed by such circuits? We give a negative answer, exhibiting simple functions with tractable marginalization yet no efficient representation by known models, assuming $\\mathsf{FP} \\neq \#\\mathsf{P}$ (an assumption implied by $\\mathsf{P} \\neq \\mathsf{NP}$). To this end, we identify a hierarchy of complexity classes corresponding to stronger forms of marginalization, all of which are efficiently computable on the known circuit models. We conclude with a completeness result, showing that whenever there is an efficient real RAM performing virtual evidence marginalization for a function, then there are small arithmetic circuits for that function’s multilinear representation.

ICML Conference 2025 Conference Paper

TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation

Gwen Yidou Weng
Benjie Wang 0001
Guy Van den Broeck

As large language models (LMs) advance, there is an increasing need to control their outputs to align with human values (e. g. , detoxification) or desired attributes (e. g. , personalization, topic). However, autoregressive models focus on next-token predictions and struggle with global properties that require looking ahead. Existing solutions either post-train LMs for each new attribute—expensive and inflexible—or approximate the Expected Attribute Probability (EAP) of future sequences by sampling or training, which is slow and unreliable for rare attributes. We introduce TRACE (Tractable Probabilistic Reasoning for Adaptable Controllable gEneration), a novel framework that efficiently computes EAP and adapts to new attributes through tractable probabilistic reasoning and lightweight control. TRACE distills a Hidden Markov Model (HMM) from an LM and pairs it with a small classifier to estimate attribute probabilities, enabling exact EAP computation over the HMM’s predicted futures. This EAP is then used to reweigh the LM’s next-token probabilities for globally compliant continuations. Empirically, TRACE achieves state-of-the-art detoxification results with only 20% decoding overhead, yields 76 low-resource personalized LMs within seconds, and seamlessly extends to composite attributes.

ICML Conference 2025 Conference Paper

Tractable Transformers for Flexible Conditional Generation

Anji Liu
Xuejie Liu
Dayuan Zhao
Mathias Niepert
Yitao Liang
Guy Van den Broeck

Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e. g. , GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based generative model that is more robust to different conditional generation tasks. Unlike existing models that rely solely on global contextual features derived from full inputs, Tracformers incorporate a sparse Transformer encoder to capture both local and global contextual information. This information is routed through a decoder for conditional generation. Empirical results demonstrate that Tracformers achieve state-of-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines.

NeurIPS Conference 2024 Conference Paper

A Compositional Atlas for Algebraic Circuits

Benjie Wang
Denis D. Mauá
Guy Van den Broeck
YooJung Choi

Circuits based on sum-product structure have become a ubiquitous representation to compactly encode knowledge, from Boolean functions to probability distributions. By imposing constraints on the structure of such circuits, certain inference queries become tractable, such as model counting and most probable configuration. Recent works have explored analyzing probabilistic and causal inference queriesas compositions of basic operators to derive tractability conditions. In this paper, we take an algebraic perspective for compositional inference, and show that a large class of queries—including marginal MAP, probabilistic answer set programming inference, and causal backdoor adjustment—correspond to a combination of basic operators over semirings: aggregation, product, and elementwise mapping. Using this framework, we uncover simple and general sufficient conditions for tractable composition of these operators, in terms of circuit properties (e. g. , marginal determinism, compatibility) and conditions on the elementwise mappings. Applying our analysis, we derive novel tractability conditions for many such compositional queries. Our results unify tractability conditions for existing problems on circuits, while providing a blueprint for analysing novel compositional inference queries.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

A Tractable Inference Perspective of Offline RL

Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang

A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return. In addition to obtaining accurate sequence models, this paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an important role in offline RL. Specifically, due to the fundamental stochasticity from the offline data-collection policies and the environment dynamics, highly non-trivial conditional/constrained generation is required to elicit rewarding actions. While it is still possible to approximate such queries, we observe that such crude estimates undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern tractable generative models to bridge the gap between good sequence models and high expected returns at evaluation time. Empirically, Trifle achieves $7$ state-of-the-art scores and the highest average scores in $9$ Gym-MuJoCo benchmarks against strong baselines. Further, Trifle significantly outperforms prior approaches in stochastic environments and safe RL tasks with minimum algorithmic modifications.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Adaptable Logical Control for Large Language Models

Honghua Zhang
Po-Nien Kung
Masahiro Yoshida
Guy Van den Broeck
Nanyun Peng

Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation to follow strict constraints at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, a neuro-symbolic framework that enables tractable and adaptable control of LLM generation to follow logical constraints reliably. Ctrl-G combines any production-ready LLM with a Hidden Markov Model (HMM), guiding LLM outputs to adhere to logical constraints represented as deterministic finite automata. We show that Ctrl-G, when a TULU2-7B model is coupled with a 2B-parameter HMM, outperforms GPT4 in text editing: on the task of generating text insertions/continuations following logical constraints, our approach achieves over 30% higher satisfaction rate in human evaluation. When applied to medium-size language models (e. g. , GPT2-large), Ctrl-G also beats its counterparts on standard benchmarks by large margins. Additionally, as a proof-of-concept study, we use Ctrl-G to assist LLM reasoning on the GSM benchmark, foreshadowing the application of Ctrl-G, as well as other constrained generation approaches, beyond traditional language generation tasks.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Image Inpainting via Tractable Steering of Diffusion Models

Anji Liu
Mathias Niepert
Guy Van den Broeck

Diffusion models are the current state of the art for generating photorealistic images. Controlling the sampling process for constrained image generation tasks such as inpainting, however, remains challenging since exact conditioning on such constraints is intractable. While existing methods use various techniques to approximate the constrained posterior, this paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior, and to leverage this signal to steer the denoising process of diffusion models. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs). Building upon prior advances, we further scale up PCs and make them capable of guiding the image generation process of diffusion models. Empirical results suggest that our approach can consistently improve the overall quality and semantic coherence of inpainted images across three natural image datasets (i.e., CelebA-HQ, ImageNet, and LSUN) with only ~10% additional computational overhead brought by the TPM. Further, with the help of an image encoder and decoder, our method can readily accept semantic constraints on specific regions of the image, which opens up the potential for more controlled image generation tasks. In addition to proposing a new framework for constrained image generation, this paper highlights the benefit of more tractable models and motivates the development of expressive TPMs.

UAI Conference 2024 Conference Paper

Polynomial Semantics of Tractable Probabilistic Circuits

Oliver Broadrick
Honghua Zhang
Guy Van den Broeck

Probabilistic circuits compute multilinear polynomials that represent probability distributions. They are tractable models that support efficient marginal inference. However, various polynomial semantics have been considered in the literature (e. g. , network polynomials, likelihood polynomials, generating functions, Fourier transforms, and characteristic polynomials). The relationships between these polynomial encodings of distributions is largely unknown. In this paper, we prove that for binary distributions, each of these probabilistic circuit models is equivalent in the sense that any circuit for one of them can be transformed into a circuit for any of the others with only a polynomial increase in size. They are therefore all tractable for marginal inference on the same class of distributions. Finally, we explore the natural extension of one such polynomial semantics, called probabilistic generating circuits, to categorical random variables, and establish that marginal inference becomes #P-hard.

ICLR Conference 2024 Conference Paper

Probabilistically Rewired Message-Passing Neural Networks

Chendi Qian
Andrei Manolache
Kareem Ahmed
Zhe Zeng 0001
Guy Van den Broeck
Mathias Niepert
Christopher Morris 0001

Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable k-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.

ICML Conference 2024 Conference Paper

Scaling Tractable Probabilistic Circuits: A Systems Perspective

Anji Liu
Kareem Ahmed
Guy Van den Broeck

Probabilistic Circuits (PCs) are a general framework for tractable deep generative models, which support exact and efficient probabilistic inference on their learned distributions. Recent modeling and training advancements have enabled their application to complex real-world tasks. However, the time and memory inefficiency of existing PC implementations hinders further scaling up. This paper proposes PyJuice, a general GPU implementation design for PCs that improves prior art in several regards. Specifically, PyJuice is 1-2 orders of magnitude faster than existing systems (including very recent ones) at training large-scale PCs. Moreover, PyJuice consumes 2-5x less GPU memory, which enables us to train larger models. At the core of our system is a compilation process that converts a PC into a compact representation amenable to efficient block-based parallelization, which significantly reduces IO and makes it possible to leverage Tensor Cores available in modern GPUs. Empirically, PyJuice can be used to improve state-of-the-art PCs trained on image (e. g. , ImageNet32) and language (e. g. , WikiText, CommonGen) datasets. We further establish a new set of baselines on natural image and language datasets by benchmarking existing PC structures but with much larger sizes and more training epochs, with the hope of incentivizing future research. Code is available at https: //github. com/Tractables/pyjuice.

NeurIPS Conference 2023 Conference Paper

A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints

Kareem Ahmed
Kai-Wei Chang
Guy Van den Broeck

Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. This often requires maximizing the likelihood of a symbolic constraint w. r. t the neural network's output distribution. Such output distributions are typically assumed to be fully-factorized. This limits the applicability of neuro-symbolic learning to the more expressive auto-regressive distributions, e. g. , transformers. Under such distributions, computing the likelihood of even simple constraints is #P-hard. Instead of attempting to enforce the constraint on the entire likelihood distribution, we propose to do so on a random, local approximation thereof. More precisely, we approximate the likelihood of the constraint with the pseudolikelihood of the constraint centered around a model sample. Our approach is factorizable, allowing us to reuse solutions to sub-problems---a main tenet for the efficient computation of neuro-symbolic losses. It also provides a local, high fidelity approximation of the likelihood: it exhibits low entropy and KL-divergence around the model sample. We tested our approach on Sudoku and shortest-path prediction cast as auto-regressive generation, and observe that we greatly improve upon the base model's ability to predict logically-consistent outputs. We also tested our approach on the task of detoxifying large language models. We observe that using a simple constraint disallowing a list of toxic words, we are able to steer the model's outputs away from toxic generations, achieving SoTA compared to previous approaches.

NeurIPS Conference 2023 Conference Paper

A Unified Approach to Count-Based Weakly Supervised Learning

Vinay Shukla
Zhe Zeng
Kareem Ahmed
Guy Van den Broeck

High-quality labels are often very scarce, whereas unlabeled data with inferred weak labels occurs more naturally. In many cases, these weak labels dictate the frequency of each respective class over a set of instances. In this paper, we develop a unified approach to learning from such weakly-labeled data, which we call *count-based weakly-supervised learning*. At the heart of our approach is the ability to compute the probability of exactly $k$ out of $n$ outputs being set to true. This computation is differentiable, exact, and efficient. Building upon the previous computation, we derive a *count loss* penalizing the model for deviations in its distribution from an arithmetic constraint defined over label counts.

AAAI Conference 2023 Conference Paper

Certifying Fairness of Probabilistic Circuits

Nikil Roashan Selvam
Guy Van den Broeck
YooJung Choi

With the increased use of machine learning systems for decision making, questions about the fairness properties of such systems start to take center stage. Most existing work on algorithmic fairness assume complete observation of features at prediction time, as is the case for popular notions like statistical parity and equal opportunity. However, this is not sufficient for models that can make predictions with partial observation as we could miss patterns of bias and incorrectly certify a model to be fair. To address this, a recently introduced notion of fairness asks whether the model exhibits any discrimination pattern, in which an individual—characterized by (partial) feature observations—receives vastly different decisions merely by disclosing one or more sensitive attributes such as gender and race. By explicitly accounting for partial observations, this provides a much more fine-grained notion of fairness. In this paper, we propose an algorithm to search for discrimination patterns in a general class of probabilistic models, namely probabilistic circuits. Previously, such algorithms were limited to naive Bayes classifiers which make strong independence assumptions; by contrast, probabilistic circuits provide a unifying framework for a wide range of tractable probabilistic models and can even be compiled from certain classes of Bayesian networks and probabilistic programs, making our method much more broadly applicable. Furthermore, for an unfair model, it may be useful to quickly find discrimination patterns and distill them for better interpretability. As such, we also propose a sampling-based approach to more efficiently mine discrimination patterns, and introduce new classes of patterns such as minimal, maximal, and Pareto optimal patterns that can effectively summarize exponentially many discrimination patterns.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Collapsed Inference for Bayesian Deep Learning

Zhe Zeng
Guy Van den Broeck

Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.

IJCAI Conference 2023 Conference Paper

On the Paradox of Learning to Reason from Data

Honghua Zhang
Liunian Harold Li
Tao Meng
Kai-Wei Chang
Guy Van den Broeck

Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has, in fact, learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models (e. g. T5) and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Out-of-Distribution Generalization by Neural-Symbolic Joint Training

Anji Liu
Hongming Xu
Guy Van den Broeck
Yitao Liang

This paper develops a novel methodology to simultaneously learn a neural network and extract generalized logic rules. Different from prior neural-symbolic methods that require background knowledge and candidate logical rules to be provided, we aim to induce task semantics with minimal priors. This is achieved by a two-step learning framework that iterates between optimizing neural predictions of task labels and searching for a more accurate representation of the hidden task semantics. Notably, supervision works in both directions: (partially) induced task semantics guide the learning of the neural network and induced neural predictions admit an improved semantic representation. We demonstrate that our proposed framework is capable of achieving superior out-of-distribution generalization performance on two tasks: (i) learning multi-digit addition, where it is trained on short sequences of digits and tested on long sequences of digits; (ii) predicting the optimal action in the Tower of Hanoi, where the model is challenged to discover a policy independent of the number of disks in the puzzle.

PDF Details DOI

UAI Conference 2023 Conference Paper

Scaling integer arithmetic in probabilistic programs

William X. Cao
Poorva Garg
Ryan Tjoa
Steven Holtzen
Todd D. Millstein
Guy Van den Broeck

Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today’s probabilistic programming languages (PPLs). The core challenge comes from discrete structure: many of today’s PPL inference strategies rely on enumeration, sampling, or differentiation in order to scale, which fail for high-dimensional complex discrete distributions involving integers. Our insight is that there is structure in arithmetic that these approaches are not using. We present a binary encoding strategy for discrete distributions that exploits the rich logical structure of integer operations like summation and comparison. We leverage this structured encoding with knowledge compilation to perform exact probabilistic inference, and show that this approach scales to much larger integer distributions with arithmetic.

ICLR Conference 2023 Conference Paper

Scaling Up Probabilistic Circuits by Latent Variable Distillation

Anji Liu
Honghua Zhang
Guy Van den Broeck

Probabilistic Circuits (PCs) are a unified framework for tractable probabilistic models that support efficient computation of various probabilistic queries (e.g., marginal probabilities). One key challenge is to scale PCs to model large and high-dimensional real-world datasets: we observe that as the number of parameters in PCs increases, their performance immediately plateaus. This phenomenon suggests that the existing optimizers fail to exploit the full expressive power of large PCs. We propose to overcome such bottleneck by latent variable distillation: we leverage the less tractable but more expressive deep generative models to provide extra supervision over the latent variables of PCs. Specifically, we extract information from Transformer-based generative models to assign values to latent variables of PCs, providing guidance to PC optimizers. Experiments on both image and language modeling benchmarks (e.g., ImageNet and WikiText-2) show that latent variable distillation substantially boosts the performance of large PCs compared to their counterparts without latent variable distillation. In particular, on the image modeling benchmarks, PCs achieve competitive performance against some of the widely-used deep generative models, including variational autoencoders and flow-based models, opening up new avenues for tractable generative modeling. Our code can be found at https://github.com/UCLA-StarAI/LVD.

NeSy Conference 2023 Conference Paper

Semantic Probabilistic Layers for Neuro-Symbolic Learning

Kareem Ahmed
Stefano Teso
Kai-Wei Chang 0001
Guy Van den Broeck
Antonio Vergari

In this extended abstract, we briefly outline Semantic Probabilistic Layers [1], a new layer that can be plugged into any neural network to guarantee its predictions are consistent with a set of predefined symbolic constraints while being amenable to end-to-end learning via maximum likelihood. SPLs can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic layers. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on an array of challenging structured-output prediction tasks.

ICLR Conference 2023 Conference Paper

SIMPLE: A Gradient Estimator for k-Subset Sampling

Kareem Ahmed
Zhe Zeng 0001
Mathias Niepert
Guy Van den Broeck

$k$-subset sampling is ubiquitous in machine learning, enabling regularization and interpretability through sparsity. The challenge lies in rendering $k$-subset sampling amenable to end-to-end learning. This has typically involved relaxing the reparameterized samples to allow for backpropagation, but introduces both bias and variance. In this work, we fall back to discrete $k$-subset sampling on the forward pass. This is coupled with using the gradient with respect to the exact marginals, computed efficiently, as a proxy for the true gradient. We show that our gradient estimator exhibits lower bias and variance compared to state-of-the-art estimators. Empirical results show improved performance on learning to explain and sparse models benchmarks. We provide an algorithm for computing the exact ELBO for the $k$-subset distribution, obtaining significantly lower loss compared to state-of-the-art discrete sparse VAEs. All of our algorithms are exact and efficient.

ICML Conference 2023 Conference Paper

Tractable Control for Autoregressive Language Generation

Honghua Zhang
Meihua Dang
Nanyun Peng 0001
Guy Van den Broeck

Despite the success of autoregressive large language models in text generation, it remains a major challenge to generate text that satisfies complex constraints: sampling from the conditional distribution ${\Pr}(\text{text} | \alpha)$ is intractable for even the simplest lexical constraints $\alpha$. To overcome this challenge, we propose to use tractable probabilistic models (TPMs) to impose lexical constraints in autoregressive text generation models, which we refer to as GeLaTo (Generating Language with Tractable Constraints). To demonstrate the effectiveness of this framework, we use distilled hidden Markov models, where we can efficiently compute ${\Pr}(\text{text} | \alpha)$, to guide autoregressive generation from GPT2. GeLaTo achieves state-of-the-art performance on challenging benchmarks for constrained text generation (e. g. , CommonGen), beating various strong baselines by a large margin. Our work not only opens up new avenues for controlling large language models but also motivates the development of more expressive TPMs.

ICML Conference 2023 Conference Paper

Understanding the Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits

Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang

Probabilistic Circuits (PCs) are a general and unified computational framework for tractable probabilistic models that support efficient computation of various inference tasks (e. g. , computing marginal probabilities). Towards enabling such reasoning capabilities in complex real-world tasks, Liu et al. (2022) propose to distill knowledge (through latent variable assignments) from less tractable but more expressive deep generative models. However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. Therefore, instead of performing distillation from the most expressive deep generative model, we study what properties the teacher model and the PC should have in order to achieve good distillation performance. This leads to a generic algorithmic improvement as well as other data-type-specific ones over the existing latent variable distillation pipeline. Empirically, we outperform SoTA TPMs by a large margin on challenging image modeling benchmarks. In particular, on ImageNet32, PCs achieve 4. 06 bits-per-dimension, which is only 0. 34 behind variational diffusion models (Kingma et al. , 2021).

ICLR Conference 2022 Conference Paper

Lossless Compression with Probabilistic Circuits

Anji Liu
Stephan Mandt
Guy Van den Broeck

Despite extensive progress on image generation, common deep generative model architectures are not easily applied to lossless compression. For example, VAEs suffer from a compression cost overhead due to their latent variables. This overhead can only be partially eliminated with elaborate schemes such as bits-back coding, often resulting in poor single-sample compression rates. To overcome such problems, we establish a new class of tractable lossless compression models that permit efficient encoding and decoding: Probabilistic Circuits (PCs). These are a class of neural networks involving $|p|$ computational units that support efficient marginalization over arbitrary subsets of the $D$ feature dimensions, enabling efficient arithmetic coding. We derive efficient encoding and decoding schemes that both have time complexity $\mathcal{O} (\log(D) \cdot |p|)$, where a naive scheme would have linear costs in $D$ and $|p|$, making the approach highly scalable. Empirically, our PC-based (de)compression algorithm runs 5-40 times faster than neural compression algorithms that achieve similar bitrates. By scaling up the traditional PC structure learning pipeline, we achieve state-of-the-art results on image datasets such as MNIST. Furthermore, PCs can be naturally integrated with existing neural compression algorithms to improve the performance of these base models on natural image datasets. Our results highlight the potential impact that non-standard learning architectures may have on neural data compression.

UAI Conference 2022 Conference Paper

Neuro-symbolic entropy regularization

Kareem Ahmed
Eric Wang
Kai-Wei Chang 0001
Guy Van den Broeck

In structured output prediction, the goal is to jointly predict several output variables that together encode a structured object – a path in a graph, an entity-relation triple, or an ordering of objects. Such a large output space makes learning hard and requires vast amounts of labeled data. Different approaches leverage alternate sources of supervision. One approach – entropy regularization – posits that decision boundaries should lie in low-probability regions. It extracts supervision from unlabeled examples, but remains agnostic to the structure of the output space. Conversely, neuro-symbolic approaches exploit the knowledge that not every prediction corresponds to a valid structure in the output space. Yet, they do not further restrict the learned output distribution. This paper introduces a framework that unifies both approaches. We propose a loss, neuro-symbolic entropy regularization, that encourages the model to confidently predict a valid object. It is obtained by restricting entropy regularization to the distribution over only the valid structures. This loss can be computed efficiently when the output constraint is expressed as a tractable logic circuit. Moreover, it seamlessly integrates with other neuro-symbolic losses that eliminate invalid predictions. We demonstrate the efficacy of our approach on a series of semi-supervised and fully-supervised structured-prediction experiments, where it leads to models whose predictions are more accurate as well as more likely to be valid.

JAIR Journal 2022 Journal Article

On the Tractability of SHAP Explanations

Guy Van den Broeck
Anton Lykov
Maximilian Schleich
Dan Suciu

SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.

PDF Details DOI

AAAI Conference 2022 System Paper

PYLON: A PyTorch Framework for Learning with Constraints

Kareem Ahmed
Tao Li
Thy Ton
Quan Guo
Kai-Wei Chang
Parisa Kordjamshidi
Vivek Srikumar
Guy Van den Broeck

Deep learning excels at learning task information from large amounts of data, but struggles with learning from declarative high-level knowledge that can be more succinctly expressed directly. In this work, we introduce PYLON, a neuro-symbolic training framework that builds on PyTorch to augment procedurally trained models with declaratively specified knowledge. PYLON lets users programmatically specify constraints as Python functions and compiles them into a differentiable loss, thus training predictive models that fit the data whilst satisfying the specified constraints. PYLON includes both exact as well as approximate compilers to efficiently compute the loss, employing fuzzy logic, sampling methods, and circuits, ensuring scalability even to complex models and constraints. Crucially, a guiding principle in designing PYLON is the ease with which any existing deep learning codebase can be extended to learn from constraints in a few lines of code: a function that expresses the constraint, and a single line to compile it into a loss. Our demo comprises of models in NLP, computer vision, logical games, and knowledge graphs that can be interactively trained using constraints as supervision.

NeurIPS Conference 2022 Conference Paper

Semantic Probabilistic Layers for Neuro-Symbolic Learning

Kareem Ahmed
Stefano Teso
Kai-Wei Chang
Guy Van den Broeck
Antonio Vergari

We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks such as hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.

NeurIPS Conference 2022 Conference Paper

Sparse Probabilistic Circuits via Pruning and Growing

Meihua Dang
Anji Liu
Guy Van den Broeck

Probabilistic circuits (PCs) are a tractable representation of probability distributions allowing for exact and efficient computation of likelihoods and marginals. There has been significant recent progress on improving the scale and expressiveness of PCs. However, PC training performance plateaus as model size increases. We discover that most capacity in existing large PC structures is wasted: fully-connected parameter layers are only sparsely used. We propose two operations: pruning and growing, that exploit the sparsity of PC structures. Specifically, the pruning operation removes unimportant sub-networks of the PC for model compression and comes with theoretical guarantees. The growing operation increases model capacity by increasing the dimensions of latent states. By alternatingly applying pruning and growing, we increase the capacity that is meaningfully used, allowing us to significantly scale up PC learning. Empirically, our learner achieves state-of-the-art likelihoods on MNIST-family image datasets and an Penn Tree Bank language data compared to other PC learners and less tractable deep generative models such as flow-based models and variational autoencoders (VAEs).

NeurIPS Conference 2021 Conference Paper

A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference

Antonio Vergari
YooJung Choi
Anji Liu
Stefano Teso
Guy Van den Broeck

Circuit representations are becoming the lingua franca to express and reason about tractable generative and discriminative models. In this paper, we show how complex inference scenarios for these models that commonly arise in machine learning---from computing the expectations of decision tree ensembles to information-theoretic divergences of sum-product networks---can be represented in terms of tractable modular operations over circuits. Specifically, we characterize the tractability of simple transformations---sums, products, quotients, powers, logarithms, and exponentials---in terms of sufficient structural constraints of the circuits they operate on, and present novel hardness results for the cases in which these properties are not satisfied. Building on these operations, we derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.

AAAI Conference 2021 Conference Paper

Group Fairness by Probabilistic Modeling with Latent Fair Decisions

YooJung Choi
Meihua Dang
Guy Van den Broeck

Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We show on real-world datasets that our approach not only is a better model of how the data was generated than existing methods but also achieves competitive accuracy. Moreover, we also evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved.

AAAI Conference 2021 System Paper

Juice: A Julia Package for Logic and Probabilistic Circuits

Meihua Dang
Pasha Khosravi
Yitao Liang
Antonio Vergari
Guy Van den Broeck

JUICE is an open-source Julia package providing tools for logic and probabilistic reasoning and learning based on logic circuits (LCs) and probabilistic circuits (PCs). It provides a range of efficient algorithms for probabilistic inference queries, such as computing marginal probabilities (MAR), as well as many more advanced queries. Certain structural circuit properties are needed to achieve this tractability, which JUICE helps validate. Additionally, it supports several parameter and structure learning algorithms proposed in the recent literature. By leveraging parallelism (on both CPU and GPU), JUICE provides a fast implementation of circuit-based algorithms, which makes it suitable for tackling large-scale datasets and models.

AAAI Conference 2021 Conference Paper

On the Tractability of SHAP Explanations

Guy Van den Broeck
Anton Lykov
Maximilian Schleich
Dan Suciu

SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we establish the complexity of computing the SHAP explanation in three important settings. First, we consider fully-factorized data distributions, and show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model. This fully-factorized setting is often used to simplify the SHAP computation, yet our results show that the computation can be intractable for commonly used models such as logistic regression. Going beyond fullyfactorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting: computing SHAP explanations of trivial classifiers over naive Bayes distributions. Finally, we show that even computing SHAP over the empirical distribution is #P-hard.

AIJ Journal 2021 Journal Article

Open-world probabilistic databases: Semantics, algorithms, complexity

İsmail İlkan Ceylan
Adnan Darwiche
Guy Van den Broeck

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry. They are continuously extended with new data, powered by modern information extraction tools that associate probabilities with knowledge base facts. The state of the art to store and process such data is founded on probabilistic databases. Many systems based on probabilistic databases, however, still have certain semantic deficiencies, which limit their potential applications. We revisit the semantics of probabilistic databases, and argue that the closed-world assumption of probabilistic databases, i. e. , the assumption that facts not appearing in the database have the probability zero, conflicts with the everyday use of large-scale probabilistic knowledge bases. To address this discrepancy, we propose open-world probabilistic databases, as a new probabilistic data model. In this new data model, the probabilities of unknown facts, also called open facts, can be assigned any probability value from a default probability interval. Our analysis entails that our model aligns better with many real-world tasks such as query answering, relational learning, knowledge base completion, and rule mining. We make various technical contributions. We show that the data complexity dichotomy, between polynomial time and Image 1, for evaluating unions of conjunctive queries on probabilistic databases can be lifted to our open-world model. This result is supported by an algorithm that computes the probabilities of the so-called safe queries efficiently. Based on this algorithm, we prove that evaluating safe queries is in linear time for probabilistic databases, under reasonable assumptions. This remains true in open-world probabilistic databases for a more restricted class of safe queries. We extend our data complexity analysis beyond unions of conjunctive queries, and obtain a host of complexity results for both classical and open-world probabilistic databases. We conclude our analysis with an in-depth investigation of the combined complexity in the respective models.

ICML Conference 2021 Conference Paper

Probabilistic Generating Circuits

Honghua Zhang
Brendan Juba
Guy Van den Broeck

Generating functions, which are widely used in combinatorics and probability theory, encode function values into the coefficients of a polynomial. In this paper, we explore their use as a tractable probabilistic model, and propose probabilistic generating circuits (PGCs) for their efficient representation. PGCs are strictly more expressive efficient than many existing tractable probabilistic models, including determinantal point processes (DPPs), probabilistic circuits (PCs) such as sum-product networks, and tractable graphical models. We contend that PGCs are not just a theoretical framework that unifies vastly different existing models, but also show great potential in modeling realistic data. We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks. We also highlight PGCs’ connection to the theory of strongly Rayleigh distributions.

IJCAI Conference 2021 Conference Paper

Probabilistic Sufficient Explanations

Eric Wang
Pasha Khosravi
Guy Van den Broeck

Understanding the behavior of learned classifiers is an important task, and various black-box explanations, logical reasoning approaches, and model-specific methods have been proposed. In this paper, we introduce probabilistic sufficient explanations, which formulate explaining an instance of classification as choosing the "simplest" subset of features such that only observing those features is "sufficient" to explain the classification. That is, sufficient to give us strong probabilistic guarantees that the model will behave similarly when all features are observed under the data distribution. In addition, we leverage tractable probabilistic reasoning tools such as probabilistic circuits and expected predictions to design a scalable algorithm for finding the desired explanations while keeping the guarantees intact. Our experiments demonstrate the effectiveness of our algorithm in finding sufficient explanations, and showcase its advantages compared to Anchors and logical explanations.

PDF Details DOI

UAI Conference 2021 Conference Paper

Tractable computation of expected kernels

Wenzhe Li
Zhe Zeng 0001
Antonio Vergari
Guy Van den Broeck

Computing the expectation of kernel functions is a ubiquitous task in machine learning, with applications from classical support vector machines to exploiting kernel embeddings of distributions in probabilistic modeling, statistical inference, causal discovery, and deep learning. In all these scenarios, we tend to resort to Monte Carlo estimates as expectations of kernels are intractable in general. In this work, we characterize the conditions under which we can compute expected kernels exactly and efficiently, by leveraging recent advances in probabilistic circuit representations. We first construct a circuit representation for kernels and propose an approach to such tractable computation. We then demonstrate possible advancements for kernel embedding frameworks by exploiting tractable expected kernels to derive new algorithms for two challenging scenarios: 1) reasoning under missing data with kernel support vector regressors; 2) devising a collapsed black-box importance sampling scheme. Finally, we empirically evaluate both algorithms and show that they outperform standard baselines on a variety of datasets.

NeurIPS Conference 2021 Conference Paper

Tractable Regularization of Probabilistic Circuits

Anji Liu
Guy Van den Broeck

Probabilistic Circuits (PCs) are a promising avenue for probabilistic modeling. They combine advantages of probabilistic graphical models (PGMs) with those of neural networks (NNs). Crucially, however, they are tractable probabilistic models, supporting efficient and exact computation of many probabilistic inference queries, such as marginals and MAP. Further, since PCs are structured computation graphs, they can take advantage of deep-learning-style parameter updates, which greatly improves their scalability. However, this innovation also makes PCs prone to overfitting, which has been observed in many standard benchmarks. Despite the existence of abundant regularization techniques for both PGMs and NNs, they are not effective enough when applied to PCs. Instead, we re-think regularization for PCs and propose two intuitive techniques, data softening and entropy regularization, that both take advantage of PCs' tractability and still have an efficient implementation as a computation graph. Specifically, data softening provides a principled way to add uncertainty in datasets in closed form, which implicitly regularizes PC parameters. To learn parameters from a softened dataset, PCs only need linear time by virtue of their tractability. In entropy regularization, the exact entropy of the distribution encoded by a PC can be regularized directly, which is again infeasible for most other density estimation models. We show that both methods consistently improve the generalization performance of a wide variety of PCs. Moreover, when paired with a simple PC structure, we achieved state-of-the-art results on 10 out of 20 standard discrete density estimation benchmarks. Open-source code and experiments are available at https: //github. com/UCLA-StarAI/Tractable-PC-Regularization.

NeurIPS Conference 2020 Conference Paper

Counterexample-Guided Learning of Monotonic Neural Networks

Aishwarya Sivaraman
Golnoosh Farnadi
Todd Millstein
Guy Van den Broeck

The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.

ICML Conference 2020 Conference Paper

Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

Robert Peharz
Steven Lang
Antonio Vergari
Karl Stelzner
Alejandro Molina 0001
Martin Trapp 0001
Guy Van den Broeck
Kristian Kersting

Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent “deep-learning-style” implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a novel implementation design for PCs, improving prior art in several regards. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation, leading to speedups and memory savings of up to two orders of magnitude, in comparison to previous implementations. As an algorithmic contribution, we show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation. Furthermore, we demonstrate that EiNets scale well to datasets which were previously out of reach, such as SVHN and CelebA, and that they can be used as faithful generative image models.

AAAI Conference 2020 Conference Paper

Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns

YooJung Choi
Golnoosh Farnadi
Behrouz Babaki
Guy Van den Broeck

As machine learning is increasingly used to make real-world decisions, recent research efforts aim to deﬁne and ensure fairness in algorithmic decision making. Existing methods often assume a ﬁxed set of observable features to deﬁne individuals, but lack a discussion of certain features not being observed at test time. In this paper, we study fairness of naive Bayes classiﬁers, which allow partial observations. In particular, we introduce the notion of a discrimination pattern, which refers to an individual receiving different classiﬁcations depending on whether some sensitive attributes were observed. Then a model is considered fair if it has no such pattern. We propose an algorithm to discover and mine for discrimination patterns in a naive Bayes classiﬁer, and show how to learn maximumlikelihood parameters subject to these fairness constraints. Our approach iteratively discovers and eliminates discrimination patterns until a fair model is learned. An empirical evaluation on three real-world datasets demonstrates that we can remove exponentially many discrimination patterns by only adding a small fraction of them as constraints.

UAI Conference 2020 Conference Paper

On the Relationship Between Probabilistic Circuits and Determinantal Point Processes

Honghua Zhang
Steven Holtzen
Guy Van den Broeck

Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these tractable models.

NeurIPS Conference 2020 Conference Paper

Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations

Zhe Zeng
Paolo Morettin
Fanqi Yan
Antonio Vergari
Guy Van den Broeck

Weighted model integration (WMI) is a framework to perform advanced probabilistic inference on hybrid domains, i. e. , on distributions over mixed continuous-discrete random variables and in presence of complex logical and arithmetic constraints. In this work, we advance the WMI framework on both the theoretical and algorithmic side. First, we exactly trace the boundaries of tractability for WMI inference by proving that to be amenable to exact and efficient inference a WMI problem has to posses a tree-shaped structure with logarithmic diameter. While this result deepens our theoretical understanding of WMI it hinders the practical applicability of exact WMI solvers to real-world problems. To overcome this, we propose the first approximate WMI solver that does not resort to sampling, but performs exact inference on one approximate models. Our solution performs message passing in a relaxed problem structure iteratively to recover certain lost dependencies and, as our experiments suggest, is competitive with other SOTA WMI solvers.

ICML Conference 2020 Conference Paper

Scaling up Hybrid Probabilistic Inference with Logical and Arithmetic Constraints via Message Passing

Zhe Zeng 0001
Paolo Morettin
Fanqi Yan
Antonio Vergari
Guy Van den Broeck

Weighted model integration (WMI) is an appealing framework for probabilistic inference: it allows for expressing the complex dependencies in real-world problems, where variables are both continuous and discrete, via the language of Satisfiability Modulo Theories (SMT), as well as to compute probabilistic queries with complex logical and arithmetic constraints. Yet, existing WMI solvers are not ready to scale to these problems. They either ignore the intrinsic dependency structure of the problem entirely, or they are limited to overly restrictive structures. To narrow this gap, we derive a factorized WMI computation enabling us to devise a scalable WMI solver based on message passing, called MP-WMI. Namely, MP-WMI is the first WMI solver that can (i) perform exact inference on the full class of tree-structured WMI problems, and (ii) perform inter-query amortization, e. g. , to compute all marginal densities simultaneously. Experimental results show that our solver dramatically outperforms the existingWMI solvers on a large set of benchmarks.

UAI Conference 2020 Conference Paper

Symbolic Querying of Vector Spaces: Probabilistic Databases Meets Relational Embeddings

Tal Friedman
Guy Van den Broeck

We propose unifying techniques from probabilistic databases and relational embedding models with the goal of performing complex queries on incomplete and uncertain data. We formalize a probabilistic database model with respect to which all queries are done. This allows us to leverage the rich literature of theory and algorithms from probabilistic databases for solving problems. While this formalization can be used with any relational embedding model, the lack of a well-defined joint probability distribution causes simple query problems to become provably hard. With this in mind, we introduce TractOR, a relational embedding model designed to be a tractable probabilistic database, by exploiting typical embedding assumptions within the probabilistic framework. Using a principled, efficient inference algorithm that can be derived from its definition, we empirically demonstrate that TractOR is an effective and general model for these querying tasks.

UAI Conference 2019 Conference Paper

Efficient Search-Based Weighted Model Integration

Zhe Zeng 0001
Guy Van den Broeck

Weighted model integration (WMI) extends Weighted model counting (WMC) to the integration of functions over mixed discrete-continuous domains. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programming. Yet, state-of-the-art tools for WMI are limited in terms of performance and ignore the independence structure that is crucial to improving efficiency. To address this limitation, we propose an efficient model integration algorithm for theories with tree primal graphs. We exploit the sparse graph structure by using search to performing integration. Our algorithm greatly improves the computational efficiency on such problems and exploits context-specific independence between variables. Experimental results show dramatic speedups compared to existing WMI solvers on problems with tree-shaped dependencies.

UAI Conference 2019 Conference Paper

Generating and Sampling Orbits for Lifted Probabilistic Inference

Steven Holtzen
Todd D. Millstein
Guy Van den Broeck

A key goal in the design of probabilistic inference algorithms is identifying and exploit- ing properties of the distribution that make inference tractable. Lifted inference algorithms identify symmetry as a property that enables efficient inference and seek to scale with the degree of symmetry of a probability model. A limitation of existing exact lifted inference techniques is that they do not apply to non- relational representations like factor graphs. In this work we provide the first example of an exact lifted inference algorithm for arbitrary discrete factor graphs. In addition we describe a lifted Markov-Chain Monte-Carlo algorithm that provably mixes rapidly in the degree of symmetry of the distribution.

AAAI Conference 2019 Conference Paper

Learning Logistic Circuits

Yitao Liang
Guy Van den Broeck

This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.

IJCAI Conference 2019 Conference Paper

On Constrained Open-World Probabilistic Databases

Tal Friedman
Guy Van den Broeck

Increasing amounts of available data have led to a heightened need for representing large-scale probabilistic knowledge bases. One approach is to use a probabilistic database, a model with strong assumptions that allow for efficiently answering many interesting queries. Recent work on open-world probabilistic databases strengthens the semantics of these probabilistic databases by discarding the assumption that any information not present in the data must be false. While intuitive, these semantics are not sufficiently precise to give reasonable answers to queries. We propose overcoming these issues by using constraints to restrict this open world. We provide an algorithm for one class of queries, and establish a basic hardness result for another. Finally, we propose an efficient and tight approximation for a large class of queries.

NeurIPS Conference 2019 Conference Paper

On Tractable Computation of Expected Predictions

Pasha Khosravi
YooJung Choi
Yitao Liang
Antonio Vergari
Guy Van den Broeck

Computing expected predictions of discriminative models is a fundamental task in machine learning that appears in many interesting applications such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations for classification scenarios in which exact computations are intractable. Our framework to compute expected predictions allows for handling of missing data during prediction time in a principled and accurate way and enables reasoning about the behavior of discriminative models. We empirically show our algorithm to consistently outperform standard imputation techniques on a variety of datasets. Finally, we illustrate how our framework can be used for exploratory data analysis.

NeurIPS Conference 2019 Conference Paper

Smoothing Structured Decomposable Circuits

Andy Shih
Guy Van den Broeck
Paul Beame
Antoine Amarilli

We study the task of smoothing a circuit, i. e. , ensuring that all children of a plus-gate mention the same variables. Circuits serve as the building blocks of state-of-the-art inference algorithms on discrete probabilistic graphical models and probabilistic programs. They are also important for discrete density estimation algorithms. Many of these tasks require the input circuit to be smooth. However, smoothing has not been studied in its own right yet, and only a trivial quadratic algorithm is known. This paper studies efficient smoothing for structured decomposable circuits. We propose a near-linear time algorithm for this task and explore lower bounds for smoothing decomposable circuits, using existing results on range-sum queries. Further, for the important case of All-Marginals, we show a more efficient linear-time algorithm. We validate experimentally the performance of our methods.

NeurIPS Conference 2019 Conference Paper

Towards Hardware-Aware Tractable Learning of Probabilistic Models

Laura Galindez Olascoaga
Wannes Meert
Nimish Shah
Marian Verhelst
Guy Van den Broeck

Smart portable applications increasingly rely on edge computing due to privacy and latency concerns. But guaranteeing always-on functionality comes with two major challenges: heavily resource-constrained hardware; and dynamic application conditions. Probabilistic models present an ideal solution to these challenges: they are robust to missing data, allow for joint predictions and have small data needs. In addition, ongoing efforts in field of tractable learning have resulted in probabilistic models with strict inference efficiency guarantees. However, the current notions of tractability are often limited to model complexity, disregarding the hardware's specifications and constraints. We propose a novel resource-aware cost metric that takes into consideration the hardware's properties in determining whether the inference task can be efficiently deployed. We use this metric to evaluate the performance versus resource trade-off relevant to the application of interest, and we propose a strategy that selects the device-settings that can optimally meet users' requirements. We showcase our framework on a mobile activity recognition scenario, and on a variety of benchmark datasets representative of the field of tractable learning and of the applications of interest.

IJCAI Conference 2019 Conference Paper

What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features

Pasha Khosravi
Yitao Liang
YooJung Choi
Guy Van den Broeck

While discriminative classifiers often yield strong predictive performance, missing feature values at prediction time can still be a challenge. Classifiers may not behave as expected under certain ways of substituting the missing values, since they inherently make assumptions about the data distribution they were trained on. In this paper, we propose a novel framework that classifies examples with missing features by computing the expected prediction with respect to a feature distribution. Moreover, we use geometric programming to learn a naive Bayes distribution that embeds a given logistic regression classifier and can efficiently take its expected predictions. Empirical evaluations show that our model achieves the same performance as the logistic regression with all features observed, and outperforms standard imputation techniques when features go missing during prediction time. Furthermore, we demonstrate that our method can be used to generate ``sufficient explanations'' of logistic regression classifications, by removing features that do not affect the classification.

ICML Conference 2018 Conference Paper

A Semantic Loss Function for Deep Learning with Symbolic Knowledge

Jingyi Xu
Zilu Zhang
Tal Friedman
Yitao Liang
Guy Van den Broeck

This paper develops a novel methodology for using symbolic knowledge in deep learning. From first principles, we derive a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output. An experimental evaluation shows that it effectively guides the learner to achieve (near-)state-of-the-art results on semi-supervised multi-class classification. Moreover, it significantly increases the ability of the neural network to predict structured objects, such as rankings and paths. These discrete concepts are tremendously difficult to learn, and benefit from a tight integration of deep learning and symbolic reasoning methods.

NeurIPS Conference 2018 Conference Paper

Approximate Knowledge Compilation by Online Collapsed Importance Sampling

Tal Friedman
Guy Van den Broeck

We introduce collapsed compilation, a novel approximate inference algorithm for discrete probabilistic graphical models. It is a collapsed sampling algorithm that incrementally selects which variable to sample next based on the partial compila- tion obtained so far. This online collapsing, together with knowledge compilation inference on the remaining variables, naturally exploits local structure and context- specific independence in the distribution. These properties are used implicitly in exact inference, but are difficult to harness for approximate inference. More- over, by having a partially compiled circuit available during sampling, collapsed compilation has access to a highly effective proposal distribution for importance sampling. Our experimental evaluation shows that collapsed compilation performs well on standard benchmarks. In particular, when the amount of exact inference is equally limited, collapsed compilation is competitive with the state of the art, and outperforms it on several benchmarks.

IJCAI Conference 2018 Conference Paper

On Robust Trimming of Bayesian Network Classifiers

YooJung Choi
Guy Van den Broeck

This paper considers the problem of removing costly features from a Bayesian network classifier. We want the classifier to be robust to these changes, and maintain its classification behavior. To this end, we propose a closeness metric between Bayesian classifiers, called the expected classification agreement (ECA). Our corresponding trimming algorithm finds an optimal subset of features and a new classification threshold that maximize the expected agreement, subject to a budgetary constraint. It utilizes new theoretical insights to perform branch-and-bound search in the space of feature sets, while computing bounds on the ECA. Our experiments investigate both the runtime cost of trimming and its effect on the robustness and accuracy of the final classifier.

ICML Conference 2018 Conference Paper

Sound Abstraction and Decomposition of Probabilistic Programs

Steven Holtzen
Guy Van den Broeck
Todd D. Millstein

Probabilistic programming languages are a flexible tool for specifying statistical models, but this flexibility comes at the cost of efficient analysis. It is currently difficult to compactly represent the subtle independence properties of a probabilistic program, and exploit independence properties to decompose inference. Classical graphical model abstractions do capture some properties of the underlying distribution, enabling inference algorithms to operate at the level of the graph topology. However, we observe that graph-based abstractions are often too coarse to capture interesting properties of programs. We propose a form of sound abstraction for probabilistic programs wherein the abstractions are themselves simplified programs. We provide a theoretical foundation for these abstractions, as well as an algorithm to generate them. Experimentally, we also illustrate the practical benefits of our framework as a tool to decompose probabilistic program inference.

UAI Conference 2017 Conference Paper

Learning the Structure of Probabilistic Sentential Decision Diagrams

Yitao Liang
Jessa Bekker
Guy Van den Broeck

The probabilistic sentential decision diagram (PSDD) was recently introduced as a tractable representation of probability distributions that are subject to logical constraints. Meanwhile, efforts in tractable learning achieved great success inducing complex joint distributions from data without constraints, while guaranteeing efficient exact probabilistic inference; for instance by learning arithmetic circuits (ACs) or sum-product networks (SPNs). This paper studies the efficacy of PSDDs for the standard tractable learning task without constraints and develops the first PSDD structure learning algorithm, called L EARN PSDD. Experiments on standard benchmarks show competitive performance, despite the fact that PSDDs are more tractable and more restrictive than their alternatives. L EARN PSDD compares favorably to SPNs, particularly in terms of model size, which is a proxy for tractability. We report state-of-the-art likelihood results on six datasets. Moreover, L EARN PSDD retains the ability to learn PSDD structures in probability spaces subject to logical constraints, which is beyond the reach of other representations.

IJCAI Conference 2017 Conference Paper

Open-World Probabilistic Databases: An Abridged Report

Ismail Ilkan Ceylan
Adnan Darwiche
Guy Van den Broeck

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to default intervals. For this open-world setting, we lift the existing data complexity dichotomy of probabilistic databases, and propose an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for non-monotone queries.

IJCAI Conference 2017 Conference Paper

Optimal Feature Selection for Decision Robustness in Bayesian Networks

YooJung Choi
Adnan Darwiche
Guy Van den Broeck

In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach.

UAI Conference 2017 Conference Paper

Probabilistic Program Abstractions

Steven Holtzen
Todd D. Millstein
Guy Van den Broeck

Abstraction is a fundamental tool for reasoning about complex systems. Program abstraction has been utilized to great effect for analyzing deterministic programs. At the heart of program abstraction is the relationship between a concrete program, which is difficult to analyze, and an abstract program, which is more tractable. Program abstractions, however, are typically not probabilistic. We generalize non-deterministic program abstractions to probabilistic program abstractions by explicitly quantifying the non-deterministic choices. Our framework upgrades key definitions and properties of abstractions to the probabilistic context. We also discuss preliminary ideas for performing inference on probabilistic abstractions and general probabilistic programs.

AAAI Conference 2016 Conference Paper

Component Caching in Hybrid Domains with Piecewise Polynomial Densities

Vaishak Belle
Guy Van den Broeck
Andrea Passerini

Counting the models of a propositional formula is an important problem: for example, it serves as the backbone of probabilistic inference by weighted model counting. A key algorithmic insight is component caching (CC), in which disjoint components of a formula, generated dynamically during a DPLL search, are cached so that they only have to be solved once. In the recent years, driven by SMT technology and probabilistic inference in hybrid domains, there is an increasing interest in counting the models of linear arithmetic sentences. To date, however, solvers for these are block-clause implementations, which are nonviable on large problem instances. In this paper, as a ﬁrst step in extending CC to hybrid domains, we show how propositional CC systems can be leveraged when limited to piecewise polynomial densities. Our experiments demonstrate a large gap in performance when compared to existing approaches based on a variety of block-clause strategies.

AIJ Journal 2016 Journal Article

Exploiting local and repeated structure in Dynamic Bayesian Networks

Jonas Vlasselaer
Wannes Meert
Guy Van den Broeck
Luc De Raedt

We introduce the structural interface algorithm for exact probabilistic inference in Dynamic Bayesian Networks. It unifies state-of-the-art techniques for inference in static and dynamic networks, by combining principles of knowledge compilation with the interface algorithm. The resulting algorithm not only exploits the repeated structure in the network, but also the local structure, including determinism, parameter equality and context-specific independence. Empirically, we show that the structural interface algorithm speeds up inference in the presence of local structure, and scales to larger and more complex networks.

IJCAI Conference 2016 Conference Paper

First-Order Model Counting in a Nutshell

Guy Van den Broeck

First-order model counting recently emerged as a computational tool for high-level probabilistic reasoning. It is concerned with counting satisfying assignments to sentences in first-order logic and upgrades the successful propositional model counting approaches to probabilistic reasoning. We give an overview of model counting as it is applied in statistical relational learning, probabilistic programming, databases, and hybrid reasoning. A short tutorial illustrates the principles behind these solvers. Finally, we show that first-order counting is a fundamentally different problem from the propositional counting techniques that inspired it.

IJCAI Conference 2016 Conference Paper

Hashing-Based Approximate Probabilistic Inference in Hybrid Domains: An Abridged Report

Vaishak Belle
Guy Van den Broeck
Andrea Passerini

In recent years, there has been considerable progress on fast randomized algorithms that approximate probabilistic inference with tight tolerance and confidence guarantees. The idea here is to formulate inference as a counting task over an annotated propositional theory, called weighted model counting (WMC), which can be partitioned into smaller tasks using universal hashing. An inherent limitation of this approach, however, is that it only admits the inference of discrete probability distributions. In this work, we consider the problem of approximating inference tasks for a probability distribution defined over discrete and continuous random variables. Building on a notion called weighted model integration, which is a strict generalization of WMC and is based on annotating Boolean and arithmetic constraints, we show how probabilistic inference in hybrid domains can be put within reach of hashing-based WMC solvers. Empirical evaluations demonstrate the applicability and promise of the proposal.

NeurIPS Conference 2016 Conference Paper

New Liftable Classes for First-Order Probabilistic Inference

Seyed Mehran Kazemi
Angelika Kimmig
Guy Van den Broeck
David Poole

Statistical relational models provide compact encodings of probabilistic dependencies in relational domains, but result in highly intractable graphical models. The goal of lifted inference is to carry out probabilistic inference without needing to reason about each individual separately, by instead treating exchangeable, undistinguished objects as a whole. In this paper, we study the domain recursion inference rule, which, despite its central role in early theoretical results on domain-lifted inference, has later been believed redundant. We show that this rule is more powerful than expected, and in fact significantly extends the range of models for which lifted inference runs in time polynomial in the number of individuals in the domain. This includes an open problem called S4, the symmetric transitivity model, and a first-order logic encoding of the birthday paradox. We further identify new classes S2FO2 and S2RU of domain-liftable theories, which respectively subsume FO2 and recursively unary theories, the largest classes of domain-liftable theories known so far, and show that using domain recursion can achieve exponential speedup even in theories that cannot fully be lifted with the existing set of inference rules.

KR Conference 2016 Conference Paper

Open-World Probabilistic Databases

Ismail Ilkan Ceylan
Adnan Darwiche
Guy Van den Broeck

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conﬂicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to intervals. While still assuming a ﬁnite domain, this semantics can provide meaningful answers when some probabilities are not precisely known. For this openworld setting, we propose an efﬁcient evaluation algorithm for unions of conjunctive queries. Our open-world algorithm incurs no overhead compared to closed-world reasoning and runs in time linear in the size of the database for tractable queries. All other queries are #P-hard, implying a data complexity dichotomy between linear time and #P. For queries involving negation, however, open-world reasoning can become NP-, or even NPPP -hard. Finally, we discuss additional knowledge-representation layers that can further strengthen open-world reasoning about big uncertain data.

IJCAI Conference 2015 Conference Paper

Anytime Inference in Probabilistic Logic Programs with Tp-Compilation

Jonas Vlasselaer
Guy Van den Broeck
Angelika Kimmig
Wannes Meert
Luc De Raedt

Existing techniques for inference in probabilistic logic programs are sequential: they first compute the relevant propositional formula for the query of interest, then compile it into a tractable target representation and finally, perform weighted model counting on the resulting representation. We propose TP -compilation, a new inference technique based on forward reasoning. TP -compilation proceeds incrementally in that it interleaves the knowledge compilation step for weighted model counting with forward reasoning on the logic program. This leads to a novel anytime algorithm that provides hard bounds on the inferred probabilities. Furthermore, an empirical evaluation shows that TP compilation effectively handles larger instances of complex real-world problems than current sequential approaches, both for exact and for anytime approximate inference.

UAI Conference 2015 Conference Paper

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data

Guy Van den Broeck
Karthika Mohan
Arthur Choi
Adnan Darwiche
Judea Pearl

We propose a family of efﬁcient algorithms for learning the parameters of a Bayesian network from incomplete data. Our approach is based on recent theoretical analyses of missing data problems, which utilize a graphical representation, called the missingness graph. In the case of MCAR and MAR data, this graph need not be explicit, and yet we can still obtain closedform, asymptotically consistent parameter estimates, without the need for inference. When this missingness graph is explicated (based on background knowledge), even partially, we can obtain even more accurate estimates with less data. Empirically, we illustrate how we can learn the parameters of large networks from large datasets, which are beyond the scope of algorithms like EM (which require inference).

UAI Conference 2015 Conference Paper

Hashing-Based Approximate Probabilistic Inference in Hybrid Domains

Vaishak Belle
Guy Van den Broeck
Andrea Passerini

In recent years, there has been considerable progress on fast randomized algorithms that approximate probabilistic inference with tight tolerance and confidence guarantees. The idea here is to formulate inference as a counting task over an annotated propositional theory, called weighted model counting (WMC), which can be partitioned into smaller tasks using universal hashing. An inherent limitation of this approach, however, is that it only admits the inference of discrete probability distributions. In this work, we consider the problem of approximating inference tasks for a probability distribution defined over discrete and continuous random variables. Building on a notion called weighted model integration, which is a strict generalization of WMC and is based on annotating Boolean and arithmetic constraints, we show how probabilistic inference in hybrid domains can be put within reach of hashing-based WMC solvers. Empirical evaluations demonstrate the applicability and promise of the proposal.

IJCAI Conference 2015 Conference Paper

Inducing Probabilistic Relational Rules from Probabilistic Examples

Luc De Raedt
Anton Dries
Ingo Thon
Guy Van den Broeck
Mathias Verbeke

We study the problem of inducing logic programs in a probabilistic setting, in which both the example descriptions and their classification can be probabilistic. The setting is incorporated in the probabilistic rule learner ProbFOIL+, which combines principles of the rule learner FOIL with ProbLog, a probabilistic Prolog. We illustrate the approach by applying it to the knowledge base of NELL, the Never-Ending Language Learner.

AAAI Conference 2015 Conference Paper

Lifted Probabilistic Inference for Asymmetric Graphical Models

Guy Van den Broeck
Mathias Niepert

Lifted probabilistic inference algorithms have been successfully applied to a large number of symmetric graphical models. Unfortunately, the majority of realworld graphical models is asymmetric. This is even the case for relational representations when evidence is given. Therefore, more recent work in the community moved to making the models symmetric and then applying existing lifted inference algorithms. However, this approach has two shortcomings. First, all existing over-symmetric approximations require a relational representation such as Markov logic networks. Second, the induced symmetries often change the distribution significantly, making the computed probabilities highly biased. We present a framework for probabilistic sampling-based inference that only uses the induced approximate symmetries to propose steps in a Metropolis- Hastings style Markov chain. The framework, therefore, leads to improved probability estimates while remaining unbiased. Experiments demonstrate that the approach outperforms existing MCMC algorithms.

AAAI Conference 2015 Conference Paper

On the Role of Canonicity in Knowledge Compilation

Guy Van den Broeck
Adnan Darwiche

Knowledge compilation is a powerful reasoning paradigm with many applications across AI and computer science more broadly. We consider the problem of bottom-up compilation of knowledge bases, which is usually predicated on the existence of a polytime function for combining compilations using Boolean operators (usually called an Apply function). While such a polytime Apply function is known to exist for certain languages (e. g. , OBDDs) and not exist for others (e. g. , DNNFs), its existence for certain languages remains unknown. Among the latter is the recently introduced language of Sentential Decision Diagrams (SDDs): while a polytime Apply function exists for SDDs, it was unknown whether such a function exists for the important subset of compressed SDDs which are canonical. We resolve this open question in this paper and consider some of its theoretical and practical implications. Some of the findings we report question the common wisdom on the relationship between bottom-up compilation, language canonicity and the complexity of the Apply function.

IJCAI Conference 2015 Conference Paper

Probabilistic Inference in Hybrid Domains by Weighted Model Integration

Vaishak Belle
Andrea Passerini
Guy Van den Broeck

Weighted model counting (WMC) on a propositional knowledge base is an effective and general approach to probabilistic inference in a variety of formalisms, including Bayesian and Markov Networks. However, an inherent limitation of WMC is that it only admits the inference of discrete probability distributions. In this paper, we introduce a strict generalization of WMC called weighted model integration that is based on annotating Boolean and arithmetic constraints, and combinations thereof. This methodology is shown to capture discrete, continuous and hybrid Markov networks. We then consider the task of parameter learning for a fragment of the language. An empirical evaluation demonstrates the applicability and promise of the proposal.

NeurIPS Conference 2015 Conference Paper

Tractable Learning for Complex Probability Queries

Jessa Bekker
Jesse Davis
Arthur Choi
Adnan Darwiche
Guy Van den Broeck

Tractable learning aims to learn probabilistic models where inference is guaranteed to be efficient. However, the particular class of queries that is tractable depends on the model and underlying representation. Usually this class is MPE or conditional probabilities $\Pr(\xs|\ys)$ for joint assignments~$\xs, \ys$. We propose a tractable learner that guarantees efficient inference for a broader class of queries. It simultaneously learns a Markov network and its tractable circuit representation, in order to guarantee and measure tractability. Our approach differs from earlier work by using Sentential Decision Diagrams (SDD) as the tractable language instead of Arithmetic Circuits (AC). SDDs have desirable properties, which more general representations such as ACs lack, that enable basic primitives for Boolean circuit compilation. This allows us to support a broader class of complex probability queries, including counting, threshold, and parity, in polytime.

IJCAI Conference 2015 Conference Paper

Tractable Learning for Structured Probability Spaces: A Case Study in Learning Preference Distributions

Arthur Choi
Guy Van den Broeck
Adnan Darwiche

Probabilistic sentential decision diagrams (PSDDs) are a tractable representation of structured probability spaces, which are characterized by complex logical constraints on what constitutes a possible world. We develop general-purpose techniques for probabilistic reasoning and learning with PSDDs, allowing one to compute the probabilities of arbitrary logical formulas and to learn PSDDs from incomplete data. We illustrate the effectiveness of these techniques in the context of learning preference distributions, to which considerable work has been devoted in the past. We show, analytically and empirically, that our proposed framework is general enough to support diverse and complex data and query types. In particular, we show that it can learn maximum-likelihood models from partial rankings, pairwise preferences, and arbitrary preference constraints. Moreover, we show that it can efficiently answer many queries exactly, from expected and most likely rankings, to the probability of pairwise preferences, and diversified recommendations. This case study illustrates the effectiveness and flexibility of the developed PSDD framework as a domain-independent tool for learning and reasoning with structured probability spaces.

AAAI Conference 2014 Conference Paper

Explanation-Based Approximate Weighted Model Counting for Probabilistic Logics

Joris Renkens
Angelika Kimmig
Guy Van den Broeck
Luc De Raedt

Probabilistic inference can be realized using weighted model counting. Despite a lot of progress, computing weighted model counts exactly is still infeasible for many problems of interest, and one typically has to resort to approximation methods. We contribute a new bounded approximation method for weighted model counting based on probabilistic logic programming principles. Our bounded approximation algorithm is an anytime algorithm that provides lower and upper bounds on the weighted model count. An empirical evaluation on probabilistic logic programs shows that our approach is effective in many cases that are currently beyond the reach of exact methods.

KR Conference 2014 Conference Paper

Probabilistic Sentential Decision Diagrams

Doga Kisa
Guy Van den Broeck
Arthur Choi
Adnan Darwiche

to perform weighted model counting efficiently. Second, that probabilistic reasoning can be reduced to weighted model counting. This development, which has its first roots in Darwiche (2002), has been underlying an increasing number of probabilistic reasoning systems in the last decade. This is especially true for representations that employ both logical and probabilistic elements (e. g., Chavira, Darwiche, and Jaeger (2006) and Fierens et al. (2011)). Moreover, the technique has been extended recently to certain first-order representations as well (Van den Broeck et al. 2011). This paper is concerned with an orthogonal contribution to this interplay between propositional logic and probability theory. The problem we tackle here is that of developing a representation of probability distributions in the presence of massive, logical constraints. That is, given a propositional logic theory which represents domain constraints, our goal is to develop a representation that induces a unique probability distribution over the models of the given theory. Moreover, the proposed representation should satisfy requirements that are sometimes viewed as necessary for the practical employment of such representations. These include a clear semantics of the representation parameters; an ability to reason with the representation efficiently; and an ability to learn its parameters from data, also efficiently. Our proposal is called a Probabilistic Sentential Decision Diagram (PSDD). It is based on the recently proposed Sentential Decision Diagram (SDD) for representing propositional theories (Darwiche 2011; Xue, Choi, and Darwiche 2012; Choi and Darwiche 2013). While the SDD is comprised of logical decision nodes, the PSDD is comprised of probabilistic decision nodes, which are induced by supplying a distribution over the branches of a logical decision node. Similar to SDDs, the PSDD is a canonical representation, but under somewhat more interesting conditions. Moreover, computing the probability of a term can be done in time linear in the PSDD size. In fact, the probability of each and every literal can be computed in only two passes over the PSDD. It is particularly notable that the local parameters of a PSDD have clear semantics with respect to the global distribution induced by the PSDD. We will also show that these parameters can be learned efficiently from complete data. This paper is structured as follows. We start by a concrete discussion on some of the applications that have driven the development of PSDDs and follow by an intuitive expo- We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.

KR Conference 2014 Conference Paper

Skolemization for Weighted First-Order Model Counting

Guy Van den Broeck
Wannes Meert
Adnan Darwiche

probabilistic inference to a WMC problem on a propositional knowledge base (Chavira, Darwiche, and Jaeger 2006; Fierens et al. 2011; 2013). Encoding first-order probabilistic models into propositional logic retains a key advantage of the Bayesian network algorithms: WMC naturally exploits determinism and local structure in the probabilistic model (Boutilier et al. 1996; Chavira and Darwiche 2005). A disadvantage is that the high-level first-order structure is lost. Poole (2003) observed that knowing the symmetries that are abundant in first-order structure can speed up probabilistic inference. Lifted inference algorithms reason about groups of objects as a whole, similar to the high-level reasoning of first-order resolution. This has lead Van den Broeck et al. (2011) and Gogate and Domingos (2011) to propose weighted first-order model counting (WFOMC) as the core reasoning task underlying lifted inference algorithms. WFOMC assigns a weight to interpretations in finitedomain, function-free first-order logic, and computes the sum of the weights of all models. Counting models at the first-order level has computational advantages. For certain classes of theories, knowing the firstorder structure gives exponential speedups (Van den Broeck 2011). For example, counting the models of a first-order universally quantified CNF with up to two logical variables per clause can always be done in time polynomial in the size of the domain of discourse. In contrast, a propositionalization of these CNFs will often have a treewidth polynomial in the domains size, and propositional model counting runs in exponential time. One major limitation of first-order model counters, however, is that they require input in Skolem normal form (i. e., without existential quantifiers). This is a common requirement for first-order automated reasoning algorithms, such as theorem provers. It is usually dealt with by Skolemization, which introduces Skolem constants and functions. However, the introduction of functions is problematic for first-order model counters as they expect a function-free input. The main contribution of this paper is a Skolemization procedure that is specific for weighted first-order model counting. The procedure maps a logical input theory to an output theory that is devoid of existential quantifiers and functions, yet has an identical weighted first-order model count. The procedure is modular, in that it remains sound when extending the input and output theories with a new First-order model counting emerged recently as a novel reasoning task, at the core of efficient algorithms for probabilistic logics. We present a Skolemization algorithm for model counting problems that eliminates existential quantifiers from a first-order logic theory without changing its weighted model count. For certain subsets of first-order logic, lifted model counters were shown to run in time polynomial in the number of objects in the domain of discourse, where propositional model counters require exponential time. However, these guarantees apply only to Skolem normal form theories (i. e., no existential quantifiers) as the presence of existential quantifiers reduces lifted model counters to propositional ones. Since textbook Skolemization is not sound for model counting, these restrictions precluded efficient model counting for directed models, such as probabilistic logic programs, which rely on existential quantification. Our Skolemization procedure extends the applicability of first-order model counters to these representations. Moreover, it simplifies the design of lifted model counting algorithms.

AAAI Conference 2014 Conference Paper

Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference

Mathias Niepert
Guy Van den Broeck

Exchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probabilistic inference. The theory is complementary to that of independence and conditional independence. We show that tractable inference in probabilistic models with high treewidth and millions of variables can be explained with the notion of finite (partial) exchangeability. We also show that existing lifted inference algorithms implicitly utilize a combination of conditional independence and partial exchangeability.

UAI Conference 2014 Conference Paper

Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting

Eric Gribkoff
Guy Van den Broeck
Dan Suciu

In this paper we study lifted inference for the Weighted First-Order Model Counting problem (WFOMC), which counts the assignments that satisfy a given sentence in first-order logic (FOL); it has applications in Statistical Relational Learning (SRL) and Probabilistic Databases (PDB). We present several results. First, we describe a lifted inference algorithm that generalizes prior approaches in SRL and PDB. Second, we provide a novel dichotomy result for a non-trivial fragment of FO CNF sentences, showing that for each sentence the WFOMC problem is either in PTIME or #Phard in the size of the input domain; we prove that, in the first case our algorithm solves the WFOMC problem in PTIME, and in the second case it fails. Third, we present several properties of the algorithm. Finally, we discuss limitations of lifted inference for symmetric probabilistic databases (where the weights of ground literals depend only on the relation name, and not on the constants of the domain), and prove the impossibility of a dichotomy result for the complexity of probabilistic inference for the entire language FOL.

NeurIPS Conference 2013 Conference Paper

On the Complexity and Approximation of Binary Evidence in Lifted Inference

Guy Van den Broeck
Adnan Darwiche

Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model's symmetries, which preempts standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this grim result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically.

AAAI Conference 2012 Conference Paper

Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference

Guy Van den Broeck
Jesse Davis

Knowledge compilation is a powerful technique for compactly representing and efficiently reasoning about logical knowledge bases. It has been successfully applied to numerous problems in artificial intelligence, such as probabilistic inference and conformant planning. Conditioning, which updates a knowledge base with observed truth values for some propositions, is one of the fundamental operations employed for reasoning. In the propositional setting, conditioning can be efficiently applied in all cases. Recently, people have explored compilation for first-order knowledge bases. The majority of this work has centered around using first-order d- DNNF circuits as the target compilation language. However, conditioning has not been studied in this setting. This paper explores how to condition a first-order d-DNNF circuit. We show that it is possible to efficiently condition these circuits on unary relations. However, we prove that conditioning on higher arity relations is #P-hard. We study the implications of these findings on the application of performing lifted inference for first-order probabilistic models. This leads to a better understanding of which types of queries lifted inference can address.

UAI Conference 2012 Conference Paper

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference

Guy Van den Broeck
Arthur Choi
Adnan Darwiche

We propose an approach to lifted approximate inference for first-order probabilistic models, such as Markov logic networks. It is based on performing exact lifted inference in a simplified first-order model, which is found by relaxing first-order constraints, and then compensating for the relaxation. These simplified models can be incrementally improved by carefully recovering constraints that have been relaxed, also at the first-order level. This leads to a spectrum of approximations, with lifted belief propagation on one end, and exact lifted inference on the other. We discuss how relaxation, compensation, and recovery can be performed, all at the firstorder level, and show empirically that our approach substantially improves on the approximations of both propositional solvers and lifted belief propagation.

AAAI Conference 2011 Conference Paper

An Algebraic Prolog for Reasoning about Possible Worlds

Angelika Kimmig
Guy Van den Broeck
Luc De Raedt

We introduce aProbLog, a generalization of the probabilistic logic programming language ProbLog. An aProbLog program consists of a set of deﬁnite clauses and a set of algebraic facts; each such fact is labeled with an element of a semiring. A wide variety of labels is possible, ranging from probability values to reals (representing costs or utilities), polynomials, Boolean functions or data structures. The semiring is then used to calculate labels of possible worlds and of queries. We formally deﬁne the semantics of aProbLog and study the aProbLog inference problem, which is concerned with computing the label of a query. Two conditions are introduced that allow one to simplify the inference problem, resulting in four different algorithms and settings. Representative basic problems for each of these four settings are: is there a possible world where a query is true (SAT), how many such possible worlds are there (#SAT), what is the probability of a query being true (PROB), and what is the most likely world where the query is true (MPE). We further illustrate these settings with a number of tasks requiring more complex semirings.

UAI Conference 2011 Conference Paper

Inference in Probabilistic Logic Programs using Weighted CNF's

Daan Fierens
Guy Van den Broeck
Ingo Thon
Bernd Gutmann
Luc De Raedt

Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. Several classical probabilistic inference tasks (such as MAP and computing marginals) have not yet received a lot of attention for this formalism. The contribution of this paper is that we develop efficient inference algorithms for these tasks. This is based on a conversion of the probabilistic logic program and the query and evidence to a weighted CNF formula. This allows us to reduce the inference tasks to well-studied tasks such as weighted model counting. To solve such tasks, we employ state-of-the-art methods. We consider multiple methods for the conversion of the programs as well as for inference on the weighted CNF. The resulting approach is evaluated experimentally and shown to improve upon the state-of-the-art in probabilistic logic programming.

IJCAI Conference 2011 Conference Paper

Lifted Probabilistic Inference by First-Order Knowledge Compilation

Guy Van den Broeck
Nima Taghipour
Wannes Meert
Jesse Davis
Luc De Raedt

Probabilistic logical languages provide powerful formalisms forknowledge representation and learning. Yet performing inference inthese languages is extremely costly, especially if it is done at thepropositional level. Lifted inference algorithms, which avoid repeatedcomputation by treating indistinguishable groups of objects as one, helpmitigate this cost. Seeking inspiration from logical inference, wherelifted inference (e. g. , resolution) is commonly performed, we developa model theoretic approach to probabilistic lifted inference. Our algorithmcompiles a first-order probabilistic theory into a first-orderdeterministic decomposable negation normal form (d-DNNF) circuit. Compilation offers the advantage that inference is polynomial in thesize of the circuit. Furthermore, by borrowing techniques from theknowledge compilation literature our algorithm effectively exploitsthe logical structure (e. g. , context-specific independencies) withinthe first-order model, which allows more computation to be done at the lifted level. An empirical comparison demonstrates the utility of the proposed approach.

PDF Details DOI

AAAI Conference 2010 Conference Paper

DTProbLog: A Decision-Theoretic Probabilistic Prolog

Guy Van den Broeck
Ingo Thon
Martijn van Otterlo
Luc De Raedt

We introduce DTPROBLOG, a decision-theoretic extension of Prolog and its probabilistic variant ProbLog. DT- PROBLOG is a simple but expressive probabilistic programming language that allows the modeling of a wide variety of domains, such as viral marketing. In DTPROBLOG, the utility of a strategy (a particular choice of actions) is defined as the expected reward for its execution in the presence of probabilistic effects. The key contribution of this paper is the introduction of exact, as well as approximate, solvers to compute the optimal strategy for a DTPROBLOG program and the decision problem it represents, by making use of binary and algebraic decision diagrams. We also report on experimental results that show the effectiveness and the practical usefulness of the approach.