Author name cluster

Cassio de Campos

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

ECAI Conference 2025 Conference Paper

From Benchmarking to Understanding FairML

Mykola Pechenizkiy
Hilde J. P. Weerts
Cassio de Campos
Yuya Sasaki 0001
Julia Stoyanovich

Benchmarks play a central role in machine learning (ML), offering standardized datasets and metrics that enable comparison and drive progress. In fairness-aware ML (fairML), however, benchmarks pose distinctive challenges. Fairness is not a purely technical property but a socio-technical concept, shaped by normative choices and institutional context. Benchmarks strip away this context: they reduce fairness to intrinsic metrics, obscure what is comparable, and collapse distinct notions of justice—from distributive allocation in credit scoring to basic rights in criminal justice—into a single optimization task. Moreover, when used as measures of progress, benchmarks risk enshrining oversimplified metrics as community standards, assuming an exception to Goodhart’s law. We argue that while benchmarking has value for building baselines and organizing competition, responsible evaluation of fairML requires complementary frameworks: ones that combine intrinsic with extrinsic, context-sensitive assessments, and that make explicit the normative assumptions underlying fairness interventions.

Details

ECAI Conference 2025 Conference Paper

Towards Privacy-Aware Bayesian Networks: A Credal Approach

Niccolò Rocchi
Fabio Stella
Cassio de Campos

Bayesian networks (BN) are versatile probabilistic graphical models that enable efficient knowledge representation and inference. These models have proven effective across diverse domains, including healthcare, bioinformatics, economics, law, and image processing. The structure and parameters of a BN can be obtained by domain experts or directly learned from available data. However, as privacy concerns escalate, it becomes increasingly critical for publicly released models to safeguard sensitive information in training data. Typically, released models do not prioritize privacy by design, and the issue equally affects BNs. In particular, tracing attacks from adversaries can combine the released BN with auxiliary data to determine whether specific individuals belong to the data from which the BN was learned. The current approach to addressing this privacy issue involves introducing noise into the learned parameters. While this method offers robust protection against tracing attacks, it also significantly impacts the model’s utility, in terms of both the significance and accuracy of the resulting inferences. Hence, high privacy may be attained, but at the cost of releasing a possibly ineffective model. This paper introduces credal networks (CN) as a novel and practical solution for balancing the model’s privacy and utility. Specifically, after adapting the notion of tracing attacks, we demonstrate that a CN enables the masking of the learned BN, thereby reducing the probability of successful tracing attacks. As CNs are obfuscated but not noisy versions of BNs, they can achieve meaningful inferences while safeguarding the privacy of the released model. Moreover, we identify key learning information that must be concealed to prevent attackers from recovering the BN underlying the released CN. Finally, we conduct a set of numerical experiments to analyze how privacy gains can be modulated by tuning the CN hyperparameters. Our results confirm that CNs provide a principled, practical, and effective approach towards the development of privacy-aware probabilistic graphical models.

Details

TMLR Journal 2025 Journal Article

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Lorenzo Loconte
Antonio Mari
Gennaro Gala
Robert Peharz
Cassio de Campos
Erik Quaeghebeur
Gennaro Vessio
Antonio Vergari

This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular “Lego block” approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.

PDF Details

NeurIPS Conference 2024 Conference Paper

Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

Gennaro Gala
Cassio de Campos
Antonio Vergari
Erik Quaeghebeur

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training. In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Continuous Mixtures of Tractable Probabilistic Models

Alvaro H.C. Correia
Gennaro Gala
Erik Quaeghebeur
Cassio de Campos
Robert Peharz

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven to be expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, and thus are capable of performing exact inference efficiently but often show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, for a finite set of integration points, the integration method effectively compiles the continuous mixture into a standard PC. In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.

PDF Details DOI

UAI Conference 2023 Conference Paper

Probabilistic Multi-Dimensional Classification

Vu-Linh Nguyen
Yang Yang
Cassio de Campos

Multi-dimensional classification (MDC) can be employed in a range of applications where one needs to predict multiple class variables for each given instance. Many existing MDC methods suffer from at least one of inaccuracy, scalability, limited use to certain types of data, hardness of interpretation or lack of probabilistic (uncertainty) estimations. This paper is an attempt to address all these disadvantages simultaneously. We propose a formal framework for probabilistic MDC in which learning an optimal multi-dimensional classifier can be decomposed, without loss of generality, into learning a set of (smaller) single-variable multi-class probabilistic classifiers and a directed acyclic graph. Current and future developments of both probabilistic classification and graphical model learning can directly enhance our framework, which is flexible and provably optimal. A collection of experiments is conducted to highlight the usefulness of this MDC framework.

Details

UAI Conference 2017 Conference Paper

Approximation Complexity of Maximum A Posteriori Inference in Sum-Product Networks

Diarmaid Conaty
Cassio de Campos
Denis Deratani Mauá

We discuss the computational complexity of approximating maximum a posteriori inference in sum-product networks. We first show NP-hardness in trees of height two by a reduction from maximum independent set; this implies non-approximability within a sublinear factor. We show that this is a tight bound, as we can find an approximation within a linear factor in networks of height two. We then show that, in trees of height three, it is NP-hard to approximate the problem within a factor 2f (n) for any sublinear function f of the size of the input n. Again, this bound is tight, as we prove that the usual max-product algorithm finds (in any network) approximations within factor 2c·n for some constant c < 1. Last, we present a simple algorithm, and show that it provably produces solutions at least as good as, and potentially much better than, the max-product algorithm. We empirically analyze the proposed algorithm against max-product using synthetic and real-world data.

Details

AAAI Conference 2017 Conference Paper

Learning Bayesian Networks with Incomplete Data by Augmentation

Tameem Adel
Cassio de Campos

We present new algorithms for learning Bayesian networks from data with missing values using a data augmentation approach. An exact Bayesian network learning algorithm is obtained by recasting the problem into a standard Bayesian network learning problem without missing data. As expected, the exact algorithm does not scale to large domains. We build on the exact method to create an approximate algorithm using a hill-climbing technique. This algorithm scales to large domains so long as a suitable standard structure learning method for complete data is available. We perform a wide range of experiments to demonstrate the beneﬁts of learning Bayesian networks with such new approach.

PDF Details

AAAI Conference 2016 Conference Paper

Learning Bayesian Networks with Bounded Tree-width via Guided Search

Siqi Nie
Cassio de Campos
Qiang Ji

Bounding the tree-width of a Bayesian network can reduce the chance of overﬁtting, and allows exact inference to be performed efﬁciently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to ﬁnd k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.

PDF Details

NeurIPS Conference 2016 Conference Paper

Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Mauro Scanagatta
Giorgio Corani
Cassio de Campos
Marco Zaffalon

We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and to large treewidths. Our novel approach consistently outperforms the state of the art on experiments with up to thousands of variables.

PDF Details

NeurIPS Conference 2015 Conference Paper

Learning Bayesian Networks with Thousands of Variables

Mauro Scanagatta
Cassio de Campos
Giorgio Corani
Marco Zaffalon

We present a method for learning Bayesian networks from data sets containingthousands of variables without the need for structure constraints. Our approachis made of two parts. The first is a novel algorithm that effectively explores thespace of possible parent sets of a node. It guides the exploration towards themost promising parent sets on the basis of an approximated score function thatis computed in constant time. The second part is an improvement of an existingordering-based algorithm for structure optimization. The new algorithm provablyachieves a higher score compared to its original formulation. On very large datasets containing up to ten thousand nodes, our novel approach consistently outper-forms the state of the art.

PDF Details

NeurIPS Conference 2014 Conference Paper

Advances in Learning Bayesian Networks of Bounded Treewidth

Siqi Nie
Denis Maua
Cassio de Campos
Qiang Ji

This work presents novel algorithms for learning Bayesian networks of bounded treewidth. Both exact and approximate methods are developed. The exact method combines mixed integer linear programming formulations for structure learning and treewidth computation. The approximate method consists in sampling k-trees (maximal graphs of treewidth k), and subsequently selecting, exactly or approximately, the best structure whose moral graph is a subgraph of that k-tree. The approaches are empirically compared to each other and to state-of-the-art methods on a collection of public data sets with up to 100 variables.

PDF Details

NeurIPS Conference 2014 Conference Paper

Global Sensitivity Analysis for MAP Inference in Graphical Models

Jasper De Bock
Cassio de Campos
Alessandro Antonucci

We study the sensitivity of a MAP configuration of a discrete probabilistic graphical model with respect to perturbations of its parameters. These perturbations are global, in the sense that simultaneous perturbations of all the parameters (or any chosen subset of them) are allowed. Our main contribution is an exact algorithm that can check whether the MAP configuration is robust with respect to given perturbations. Its complexity is essentially the same as that of obtaining the MAP configuration itself, so it can be promptly used with minimal effort. We use our algorithm to identify the largest global perturbation that does not induce a change in the MAP configuration, and we successfully apply this robustness measure in two practical scenarios: the prediction of facial action units with posed images and the classification of multiple real public data sets. A strong correlation between the proposed robustness measure and accuracy is verified in both scenarios.

PDF Details

AAAI Conference 2013 Conference Paper

Complexity of Inferences in Polytree-shaped Semi-Qualitative Probabilistic Networks

Cassio de Campos
Fabio Cozman

Semi-qualitative probabilistic networks (SQPNs) merge two important graphical model formalisms: Bayesian networks and qualitative probabilistic networks. They provide a very general modeling framework by allowing the combination of numeric and qualitative assessments over a discrete domain, and can be compactly encoded by exploiting the same factorization of joint probability distributions that are behind the Bayesian networks. This paper explores the computational complexity of semi-qualitative probabilistic networks, and takes the polytree-shaped networks as its main target. We show that the inference problem is coNP-Complete for binary polytrees with multiple observed nodes. We also show that inferences can be performed in time linear in the number of nodes if there is a single observed node. Because our proof is constructive, we obtain an efficient linear time algorithm for SQPNs under such assumptions. To the best of our knowledge, this is the first exact polynomial-time algorithm for SQPNs. Together these results provide a clear picture of the inferential complexity in polytree-shaped SQPNs.

PDF Details

UAI Conference 2013 Conference Paper

On the Complexity of Strong and Epistemic Credal Networks

Denis Deratani Mauá
Cassio de Campos
Alessio Benavoli
Alessandro Antonucci 0001

Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e. g. , Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NPhard even in trees with ternary variables. We prove that under epistemic irrelevance the polynomial time complexity of inferences in credal trees is not likely to extend to more general models (e. g. singly connected networks). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding computational complexity.

Details

ICML Conference 2012 Conference Paper

Anytime Marginal MAP Inference

Denis Deratani Mauá
Cassio de Campos

Details

UAI Conference 2012 Conference Paper

The Complexity of Approximately Solving Influence Diagrams

Denis Deratani Mauá
Cassio de Campos
Marco Zaffalon

Influence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the treewidth and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme.

Details

AAAI Conference 2010 Conference Paper

Properties of Bayesian Dirichlet Scores to Learn Bayesian Network Structures

Cassio de Campos
Qiang Ji

This paper addresses exact learning of Bayesian network structure from data based on the Bayesian Dirichlet score function and its derivations. We describe useful properties that strongly reduce the computational costs of many known methods without losing global optimality guarantees. We show empirically the advantages of the properties in terms of time and memory consumptions, demonstrating that state-ofthe-art methods, with the use of such properties, might handle larger data sets than those currently possible.

PDF Details

ICML Conference 2009 Conference Paper

Structure learning of Bayesian networks using constraints

Cassio de Campos
Zhi Zeng
Qiang Ji

This paper addresses exact learning of Bayesian network structure from data and expert's knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hill-climbing, dynamic programming and sampling variable orderings. Secondly, a branch and bound algorithm is presented that integrates parameter and structural constraints with data in a way to guarantee global optimality with respect to the score function. It is an any-time procedure because, if stopped, it provides the best current solution and an estimation about how far it is from the global solution. We show empirically the advantages of the properties and the constraints, and the applicability of the algorithm to large data sets (up to one hundred variables) that cannot be handled by other current methods (limited to around 30 variables).

Details

UAI Conference 2008 Conference Paper

Strategy Selection in Influence Diagrams using Imprecise Probabilities

Cassio de Campos
Qiang Ji

This paper describes a new algorithm to solve the decision making problem in Influence Diagrams based on algorithms for credal networks. Decision nodes are associated to imprecise probability distributions and a reformulation is introduced that finds the global maximum strategy with respect to the expected utility. We work with Limited Memory Influence Diagrams, which generalize most Influence Diagram proposals and handle simultaneous decisions. Besides the global optimum method, we explore an anytime approximate solution with a guaranteed maximum error and show that imprecise probabilities are handled in a straightforward way. Complexity issues and experiments with random diagrams and an effects-based military planning problem are discussed.

Details

UAI Conference 2005 Conference Paper

Belief Updating and Learning in Semi-Qualitative Probabilistic Networks

Cassio de Campos
Fábio Gagliardi Cozman

This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates.

Details

UAI Conference 2004 Conference Paper

Propositional and Relational Bayesian Networks Associated with Imprecise and Qualitat

Fábio Gagliardi Cozman
Cassio de Campos
Jaime Shinsuke Ide
José Carlos Ferreira da Rocha

This paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically.

Details

UAI Conference 2003 Conference Paper

Inference in Polytrees with Sets of Probabilities

José Carlos Ferreira da Rocha
Fábio Gagliardi Cozman
Cassio de Campos

Inferences in directed acyclic graphs associated with probability sets and probability intervals are NP-hard, even for polytrees. In this paper we focus on such inferences, and propose: 1) a substantial improvement on Tessems A / R algorithm FOR polytrees WITH probability intervals; 2) a new algorithm FOR direction - based local search(IN sets OF probability) that improves ON existing methods; 3) a collection OF branch - AND - bound algorithms that combine the previous techniques.The first two techniques lead TO approximate solutions, WHILE branch - AND - bound procedures can produce either exact OR approximate solutions.We report ON dramatic improvements ON existing techniques FOR inference WITH probability sets AND intervals, IN SOME cases reducing the computational effort BY many orders OF magnitude.

Details