Arrow Research search

Author name cluster

Arthur Choi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

KR Conference 2023 Conference Paper

On Training Neurons with Bounded Compilations

  • Lance Kennedy
  • Issouf Kindo
  • Arthur Choi

Knowledge compilation offers a formal approach to explaining and verifying the behavior of machine learning systems, such as neural networks. Unfortunately, compiling even an individual neuron into a tractable representation such as an Ordered Binary Decision Diagram (OBDD), is an NP-hard problem. In this paper, we consider the problem of training a neuron from data, subject to the constraint that it has a compact representation as an OBDD. Our approach is based on the observation that a neuron can be compiled into an OBDD in polytime if (1) the neuron has integer weights, and (2) its aggregate weight is bounded. Unfortunately, we first show that it is also NP-hard to train a neuron, subject to these two constraints. On the other hand, we show that if we train a neuron generatively, rather than discriminatively, a neuron with bounded aggregate weight can be trained in pseudo-polynomial time. Hence, we propose the first efficient algorithm for training a neuron that is guaranteed to have a compact representation as an OBDD. Empirically, we show that our approach can train neurons with higher accuracy and more compact OBDDs.

KR Conference 2020 Conference Paper

On Tractable Representations of Binary Neural Networks

  • Weijia Shi
  • Andy Shih
  • Adnan Darwiche
  • Arthur Choi

We consider the compilation of a binary neural network’s decision function into tractable representations such as Ordered Binary Decision Diagrams (OBDDs) and Sentential Decision Diagrams (SDDs). Obtaining this function as an OBDD/SDD facilitates the explanation and formal verification of a neural network’s behavior. First, we consider the task of verifying the robustness of a neural network, and show how we can compute the expected robustness of a neural network, given an OBDD/SDD representation of it. Next, we consider a more efficient approach for compiling neural networks, based on a pseudo-polynomial time algorithm for compiling a neuron. We then provide a case study in a handwritten digits dataset, highlighting how two neural networks trained from the same dataset can have very high accuracies, yet have very different levels of robustness. Finally, in experiments, we show that it is feasible to obtain compact representations of neural networks as SDDs.

AAAI Conference 2019 Conference Paper

Compiling Bayesian Network Classifiers into Decision Graphs

  • Andy Shih
  • Arthur Choi
  • Adnan Darwiche

We propose an algorithm for compiling Bayesian network classifiers into decision graphs that mimic the input and output behavior of the classifiers. In particular, we compile Bayesian network classifiers into ordered decision graphs, which are tractable and can be exponentially smaller in size than decision trees. This tractability facilitates reasoning about the behavior of Bayesian network classifiers, including the explanation of decisions they make. Our compilation algorithm comes with guarantees on the time of compilation and the size of compiled decision graphs. We apply our compilation algorithm to classifiers from the literature and discuss some case studies in which we show how to automatically explain their decisions and verify properties of their behavior.

ICML Conference 2019 Conference Paper

Conditional Independence in Testing Bayesian Networks

  • Yujia Shen
  • Haiying Huang 0002
  • Arthur Choi
  • Adnan Darwiche

Testing Bayesian Networks (TBNs) were introduced recently to represent a set of distributions, one of which is selected based on the given evidence and used for reasoning. TBNs are more expressive than classical Bayesian Networks (BNs): Marginal queries correspond to multi-linear functions in BNs and to piecewise multi-linear functions in TBNs. Moreover, TBN queries are universal approximators, like neural networks. In this paper, we study conditional independence in TBNs, showing that it can be inferred from d-separation as in BNs. We also study the role of TBN expressiveness and independence in dealing with the problem of learning with incomplete models (i. e. , ones that miss nodes or edges from the data-generating model). Finally, we illustrate our results on a number of concrete examples, including a case study on Hidden Markov Models.

AAAI Conference 2019 Conference Paper

Structured Bayesian Networks: From Inference to Learning with Routes

  • Yujia Shen
  • Anchal Goyanka
  • Adnan Darwiche
  • Arthur Choi

Structured Bayesian networks (SBNs) are a recently proposed class of probabilistic graphical models which integrate background knowledge in two forms: conditional independence constraints and Boolean domain constraints. In this paper, we propose the first exact inference algorithm for SBNs, based on compiling a given SBN to a Probabilistic Sentential Decision Diagram (PSDD). We further identify a tractable subclass of SBNs, which have PSDDs of polynomial size. These SBNs yield a tractable model of route distributions, whose structure can be learned from GPS data, using a simple algorithm that we propose. Empirically, we demonstrate the utility of our inference algorithm, showing that it can be an order-ofmagnitude more efficient than more traditional approaches to exact inference. We demonstrate the utility of our learning algorithm, showing that it can learn more accurate models and classifiers from GPS data.

SAT Conference 2019 Conference Paper

Verifying Binarized Neural Networks by Angluin-Style Learning

  • Andy Shih
  • Adnan Darwiche
  • Arthur Choi

Abstract We consider the problem of verifying the behavior of binarized neural networks on some input region. We propose an Angluin-style learning algorithm to compile a neural network on a given region into an Ordered Binary Decision Diagram (OBDD), using a SAT solver as an equivalence oracle. The OBDD allows us to efficiently answer a range of verification queries, including counting, computing the probability of counterexamples, and identifying common characteristics of counterexamples. We also present experimental results on verifying binarized neural networks that recognize images of handwritten digits.

IJCAI Conference 2018 Conference Paper

A Symbolic Approach to Explaining Bayesian Network Classifiers

  • Andy Shih
  • Arthur Choi
  • Adnan Darwiche

We propose an approach for explaining Bayesian network classifiers, which is based on compiling such classifiers into decision functions that have a tractable and symbolic form. We introduce two types of explanations for why a classifier may have classified an instance positively or negatively and suggest algorithms for computing these explanations. The first type of explanation identifies a minimal set of the currently active features that is responsible for the current classification, while the second type of explanation identifies a minimal set of features whose current state (active or not) is sufficient for the classification. We consider in particular the compilation of Naive and Latent-Tree Bayesian network classifiers into Ordered Decision Diagrams (ODDs), providing a context for evaluating our proposal using case studies and experiments based on classifiers from the literature.

AAAI Conference 2018 Conference Paper

Conditional PSDDs: Modeling and Learning With Modular Knowledge

  • Yujia Shen
  • Arthur Choi
  • Adnan Darwiche

Probabilistic Sentential Decision Diagrams (PSDDs) have been proposed for learning tractable probability distributions from a combination of data and background knowledge (in the form of Boolean constraints). In this paper, we propose a variant on PSDDs, called conditional PSDDs, for representing a family of distributions that are conditioned on the same set of variables. Conditional PSDDs can also be learned from a combination of data and (modular) background knowledge. We use conditional PSDDs to define a more structured version of Bayesian networks, in which nodes can have an exponential number of states, hence expanding the scope of domains where Bayesian networks can be applied. Compared to classical PSDDs, the new representation exploits the independencies captured by a Bayesian network to decompose the learning process into localized learning tasks, which enables the learning of better models while using less computation. We illustrate the promise of conditional PSDDs and structured Bayesian networks empirically, and by providing a case study to the modeling of distributions over routes on a map.

UAI Conference 2017 Conference Paper

A Tractable Probabilistic Model for Subset Selection

  • Yujia Shen
  • Arthur Choi
  • Adnan Darwiche

Subset selection tasks, such as top-k ranking, induce datasets where examples have cardinalities that are known a priori. In this paper, we propose a tractable probabilistic model for subset selection and show how it can be learned from data. Our proposed model is interpretable and subsumes a previously introduced model based on logistic regression. We show how the parameters of our model can be estimated in closed form given complete data, and propose an algorithm for learning its structure in an interpretable space. We highlight the intuitive structures that we learn via case studies. We finally show how our proposed model can be viewed as an instance of the recently proposed Probabilistic Sentential Decision Diagram.

AIJ Journal 2017 Journal Article

Learning Bayesian network parameters under equivalence constraints

  • Tiansheng Yao
  • Arthur Choi
  • Adnan Darwiche

We propose a principled approach for learning parameters in Bayesian networks from incomplete datasets, where the examples of a dataset are subject to equivalence constraints. These equivalence constraints arise from datasets where examples are tied together, in that we may not know the value of a particular variable, but whatever that value is, we know it must be the same across different examples. We formalize the problem by defining the notion of a constrained dataset and a corresponding constrained likelihood that we seek to optimize. We further propose a new learning algorithm that can effectively learn more accurate Bayesian networks using equivalence constraints, which we demonstrate empirically. Moreover, we highlight how our general approach can be brought to bear on more specialized learning tasks, such as those in semi-supervised clustering and topic modeling, where more domain-specific approaches were previously developed.

ICML Conference 2017 Conference Paper

On Relaxing Determinism in Arithmetic Circuits

  • Arthur Choi
  • Adnan Darwiche

The past decade has seen a significant interest in learning tractable probabilistic representations. Arithmetic circuits (ACs) were among the first proposed tractable representations, with some subsequent representations being instances of ACs with weaker or stronger properties. In this paper, we provide a formal basis under which variants on ACs can be compared, and where the precise roles and semantics of their various properties can be made more transparent. This allows us to place some recent developments on ACs in a clearer perspective and to also derive new results for ACs. This includes an exponential separation between ACs with and without determinism; completeness and incompleteness results; and tractability results (or lack thereof) when computing most probable explanations (MPEs).

NeurIPS Conference 2017 Conference Paper

Tractability in Structured Probability Spaces

  • Arthur Choi
  • Yujia Shen
  • Adnan Darwiche

Recently, the Probabilistic Sentential Decision Diagram (PSDD) has been proposed as a framework for systematically inducing and learning distributions over structured objects, including combinatorial objects such as permutations and rankings, paths and matchings on a graph, etc. In this paper, we study the scalability of such models in the context of representing and learning distributions over routes on a map. In particular, we introduce the notion of a hierarchical route distribution and show how they can be leveraged to construct tractable PSDDs over route distributions, allowing them to scale to larger maps. We illustrate the utility of our model empirically, in a route prediction task, showing how accuracy can be increased significantly compared to Markov models.

NeurIPS Conference 2016 Conference Paper

Learning Bayesian networks with ancestral constraints

  • Eunice Yuh-Jie Chen
  • Yujia Shen
  • Arthur Choi
  • Adnan Darwiche

We consider the problem of learning Bayesian networks optimally, when subject to background knowledge in the form of ancestral constraints. Our approach is based on a recently proposed framework for optimal structure learning based on non-decomposable scores, which is general enough to accommodate ancestral constraints. The proposed framework exploits oracles for learning structures using decomposable scores, which cannot accommodate ancestral constraints since they are non-decomposable. We show how to empower these oracles by passing them decomposable constraints that they can handle, which are inferred from ancestral constraints that they cannot handle. Empirically, we demonstrate that our approach can be orders-of-magnitude more efficient than alternative frameworks, such as those based on integer linear programming.

KR Conference 2016 Conference Paper

Solving PP^PP-complete Problems using Knowledge Compilation

  • Umut Oztok
  • Arthur Choi
  • Adnan Darwiche

Knowledge compilation has been successfully used to solve beyond NP problems, including some PP-complete and NPPP -complete problems for Bayesian networks. In this work we show how knowledge compilation can be used to solve problems in the more intractable complexity class PPPP. This class contains NPPP and includes interesting AI problems, such as non-myopic value of information. We show how to solve the prototypical PPPP -complete problem M AJ M AJSAT in linear-time once the problem instance is compiled into a special class of Sentential Decision Diagrams. To show the practical value of our approach, we adapt it to answer the Same-Decision Probability (SDP) query, which was recently introduced for Bayesian networks. The SDP problem is also PPPP -complete. It is a value-of-information query that quantifies the robustness of threshold-based decisions and comes with a corresponding algorithm that was also recently proposed. We present favorable experimental results, comparing our new algorithm based on knowledge compilation with the state-of-the-art algorithm for computing the SDP.

AAAI Conference 2016 Conference Paper

Structured Features in Naive Bayes Classification

  • Arthur Choi
  • Nazgol Tavabi
  • Adnan Darwiche

We propose the structured naive Bayes (SNB) classi- fier, which augments the ubiquitous naive Bayes classifier with structured features. SNB classifiers facilitate the use of complex features, such as combinatorial objects (e. g. , graphs, paths and orders) in a general but systematic way. Underlying the SNB classifier is the recently proposed Probabilistic Sentential Decision Diagram (PSDD), which is a tractable representation of probability distributions over structured spaces. We illustrate the utility and generality of the SNB classifier via case studies. First, we show how we can distinguish players of simple games in terms of play style and skill level based purely on observing the games they play. Second, we show how we can detect anomalous paths taken on graphs based purely on observing the paths themselves.

NeurIPS Conference 2016 Conference Paper

Tractable Operations for Arithmetic Circuits of Probabilistic Models

  • Yujia Shen
  • Arthur Choi
  • Adnan Darwiche

We consider tractable representations of probability distributions and the polytime operations they support. In particular, we consider a recently proposed arithmetic circuit representation, the Probabilistic Sentential Decision Diagram (PSDD). We show that PSDD supports a polytime multiplication operator, while they do not support a polytime operator for summing-out variables. A polytime multiplication operator make PSDDs suitable for a broader class of applications compared to arithmetic circuits, which do not in general support multiplication. As one example, we show that PSDD multiplication leads to a very simple but effective compilation algorithm for probabilistic graphical models: represent each model factor as a PSDD, and then multiply them.

UAI Conference 2015 Conference Paper

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data

  • Guy Van den Broeck
  • Karthika Mohan
  • Arthur Choi
  • Adnan Darwiche
  • Judea Pearl

We propose a family of efficient algorithms for learning the parameters of a Bayesian network from incomplete data. Our approach is based on recent theoretical analyses of missing data problems, which utilize a graphical representation, called the missingness graph. In the case of MCAR and MAR data, this graph need not be explicit, and yet we can still obtain closedform, asymptotically consistent parameter estimates, without the need for inference. When this missingness graph is explicated (based on background knowledge), even partially, we can obtain even more accurate estimates with less data. Empirically, we illustrate how we can learn the parameters of large networks from large datasets, which are beyond the scope of algorithms like EM (which require inference).

NeurIPS Conference 2015 Conference Paper

Tractable Learning for Complex Probability Queries

  • Jessa Bekker
  • Jesse Davis
  • Arthur Choi
  • Adnan Darwiche
  • Guy Van den Broeck

Tractable learning aims to learn probabilistic models where inference is guaranteed to be efficient. However, the particular class of queries that is tractable depends on the model and underlying representation. Usually this class is MPE or conditional probabilities $\Pr(\xs|\ys)$ for joint assignments~$\xs, \ys$. We propose a tractable learner that guarantees efficient inference for a broader class of queries. It simultaneously learns a Markov network and its tractable circuit representation, in order to guarantee and measure tractability. Our approach differs from earlier work by using Sentential Decision Diagrams (SDD) as the tractable language instead of Arithmetic Circuits (AC). SDDs have desirable properties, which more general representations such as ACs lack, that enable basic primitives for Boolean circuit compilation. This allows us to support a broader class of complex probability queries, including counting, threshold, and parity, in polytime.

IJCAI Conference 2015 Conference Paper

Tractable Learning for Structured Probability Spaces: A Case Study in Learning Preference Distributions

  • Arthur Choi
  • Guy Van den Broeck
  • Adnan Darwiche

Probabilistic sentential decision diagrams (PSDDs) are a tractable representation of structured probability spaces, which are characterized by complex logical constraints on what constitutes a possible world. We develop general-purpose techniques for probabilistic reasoning and learning with PSDDs, allowing one to compute the probabilities of arbitrary logical formulas and to learn PSDDs from incomplete data. We illustrate the effectiveness of these techniques in the context of learning preference distributions, to which considerable work has been devoted in the past. We show, analytically and empirically, that our proposed framework is general enough to support diverse and complex data and query types. In particular, we show that it can learn maximum-likelihood models from partial rankings, pairwise preferences, and arbitrary preference constraints. Moreover, we show that it can efficiently answer many queries exactly, from expected and most likely rankings, to the probability of pairwise preferences, and diversified recommendations. This case study illustrates the effectiveness and flexibility of the developed PSDD framework as a domain-independent tool for learning and reasoning with structured probability spaces.

AAAI Conference 2015 Conference Paper

Value of Information Based on Decision Robustness

  • Suming Chen
  • Arthur Choi
  • Adnan Darwiche

There are many criteria for measuring the value of information (VOI), each based on a different principle that is usually suitable for specific applications. We propose a new criterion for measuring the value of information, which values information that leads to robust decisions (i. e. , ones that are unlikely to change due to new information). We also introduce an algorithm for Naive Bayes networks that selects features with maximal VOI under the new criterion. We discuss the application of the new criterion to classification tasks, showing how it can be used to tradeoff the budget, allotted for acquiring information, with the classification accuracy. In particular, we show empirically that the new criterion can reduce the expended budget significantly while reducing the classification accuracy only slightly. We also show empirically that the new criterion leads to decisions that are much more robust than those based on traditional VOI criteria, such as information gain and classification loss. This make the new criterion particularly suitable for certain decision making applications.

NeurIPS Conference 2014 Conference Paper

Decomposing Parameter Estimation Problems

  • Khaled Refaat
  • Arthur Choi
  • Adnan Darwiche

We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems. Our technique applies to incomplete datasets and exploits variables that are either hidden or observed in the given dataset. We show empirically that the proposed technique can lead to orders-of-magnitude savings in learning time. We explain, analytically and empirically, the reasons behind our reported savings, and compare the proposed technique to related ones that are sometimes used by inference algorithms.

KR Conference 2014 Conference Paper

Probabilistic Sentential Decision Diagrams

  • Doga Kisa
  • Guy Van den Broeck
  • Arthur Choi
  • Adnan Darwiche

to perform weighted model counting efficiently. Second, that probabilistic reasoning can be reduced to weighted model counting. This development, which has its first roots in Darwiche (2002), has been underlying an increasing number of probabilistic reasoning systems in the last decade. This is especially true for representations that employ both logical and probabilistic elements (e. g., Chavira, Darwiche, and Jaeger (2006) and Fierens et al. (2011)). Moreover, the technique has been extended recently to certain first-order representations as well (Van den Broeck et al. 2011). This paper is concerned with an orthogonal contribution to this interplay between propositional logic and probability theory. The problem we tackle here is that of developing a representation of probability distributions in the presence of massive, logical constraints. That is, given a propositional logic theory which represents domain constraints, our goal is to develop a representation that induces a unique probability distribution over the models of the given theory. Moreover, the proposed representation should satisfy requirements that are sometimes viewed as necessary for the practical employment of such representations. These include a clear semantics of the representation parameters; an ability to reason with the representation efficiently; and an ability to learn its parameters from data, also efficiently. Our proposal is called a Probabilistic Sentential Decision Diagram (PSDD). It is based on the recently proposed Sentential Decision Diagram (SDD) for representing propositional theories (Darwiche 2011; Xue, Choi, and Darwiche 2012; Choi and Darwiche 2013). While the SDD is comprised of logical decision nodes, the PSDD is comprised of probabilistic decision nodes, which are induced by supplying a distribution over the branches of a logical decision node. Similar to SDDs, the PSDD is a canonical representation, but under somewhat more interesting conditions. Moreover, computing the probability of a term can be done in time linear in the PSDD size. In fact, the probability of each and every literal can be computed in only two passes over the PSDD. It is particularly notable that the local parameters of a PSDD have clear semantics with respect to the global distribution induced by the PSDD. We will also show that these parameters can be learned efficiently from complete data. This paper is structured as follows. We start by a concrete discussion on some of the applications that have driven the development of PSDDs and follow by an intuitive expo- We propose the Probabilistic Sentential Decision Diagram (PSDD): A complete and canonical representation of probability distributions defined over the models of a given propositional theory. Each parameter of a PSDD can be viewed as the (conditional) probability of making a decision in a corresponding Sentential Decision Diagram (SDD). The SDD itself is a recently proposed complete and canonical representation of propositional theories. We explore a number of interesting properties of PSDDs, including the independencies that underlie them. We show that the PSDD is a tractable representation. We further show how the parameters of a PSDD can be efficiently estimated, in closed form, from complete data. We empirically evaluate the quality of PSDDs learned from data, when we have knowledge, a priori, of the domain logical constraints.

IJCAI Conference 2013 Conference Paper

An Exact Algorithm for Computing the Same-Decision Probability

  • Suming Chen
  • Arthur Choi
  • Adnan Darwiche

When using graphical models for decision making, the presence of unobserved variables may hinder our ability to reach the correct decision. A fundamental question here is whether or not one is ready to make a decision (stopping criteria), and if not, what additional observations should be made in order to better prepare for a decision (selection criteria). A recently introduced notion, the Same- Decision Probability (SDP), has been shown to be useful as both a stopping and a selection criteria. This query has been shown to be highly intractable, being PPPP –complete, and is exemplary of a class of queries which correspond to the computation of certain expectations. We propose the first exact algorithm for computing the SDP in this paper, and demonstrate its effectiveness on several real and synthetic networks. We also present a new complexity result for computing the SDP on models with a Naive Bayes structure.

AAAI Conference 2013 Conference Paper

Dynamic Minimization of Sentential Decision Diagrams

  • Arthur Choi
  • Adnan Darwiche

The Sentential Decision Diagram (SDD) is a recently proposed representation of Boolean functions, containing Ordered Binary Decision Diagrams (OBDDs) as a distinguished subclass. While OBDDs are characterized by total variable orders, SDDs are characterized more generally by vtrees. As both OBDDs and SDDs have canonical representations, searching for OBDDs and SDDs of minimal size simplifies to searching for variable orders and vtrees, respectively. For OBDDs, there are effective heuristics for dynamic reordering, based on locally swapping variables. In this paper, we propose an analogous approach for SDDs which navigates the space of vtrees via two operations: one based on tree rotations and a second based on swapping children in a vtree. We propose a particular heuristic for dynamically searching the space of vtrees, showing that it can find SDDs that are an order-of-magnitude more succinct than OBDDs found by dynamic reordering.

NeurIPS Conference 2013 Conference Paper

EDML for Learning Parameters in Directed and Undirected Graphical Models

  • Khaled Refaat
  • Arthur Choi
  • Adnan Darwiche

EDML is a recently proposed algorithm for learning parameters in Bayesian networks. It was originally derived in terms of approximate inference on a meta-network, which underlies the Bayesian approach to parameter estimation. While this initial derivation helped discover EDML in the first place and provided a concrete context for identifying some of its properties (e. g. , in contrast to EM), the formal setting was somewhat tedious in the number of concepts it drew on. In this paper, we propose a greatly simplified perspective on EDML, which casts it as a general approach to continuous optimization. The new perspective has several advantages. First, it makes immediate some results that were non-trivial to prove initially. Second, it facilitates the design of EDML algorithms for new graphical models, leading to a new algorithm for learning parameters in Markov networks. We derive this algorithm in this paper, and show, empirically, that it can sometimes learn better estimates from complete data, several times faster than commonly used optimization methods, such as conjugate gradient and L-BFGS.

AAAI Conference 2012 Conference Paper

Basing Decisions on Sentences in Decision Diagrams

  • Yexiang Xue
  • Arthur Choi
  • Adnan Darwiche

The Sentential Decision Diagram (SDD) is a recently proposed representation of Boolean functions, containing Ordered Binary Decision Diagrams (OBDDs) as a distinguished subclass. While OBDDs are characterized by total variable orders, SDDs are characterized by dissections of variable orders, known as vtrees. Despite this generality, SDDs retain a number of properties, such as canonicity and a polytime Apply operator, that have been critical to the practical success of OBDDs. Moreover, upper bounds on the size of SDDs were also given, which are tighter than comparable upper bounds on the size of OBDDs. In this paper, we analyze more closely some of the theoretical properties of SDDs and their size. In particular, we consider the impact of basing decisions on sentences (using dissections as in SDDs), in comparison to basing decisions on variables (using total variable orders as in OBDDs). Here, we identify a class of Boolean functions where basing decisions on sentences using dissections of a variable order can lead to exponentially more compact SDDs, compared to OBDDs based on the same variable order. Moreover, we identify a fundamental property of the decompositions that underlie SDDs and use it to show how certain changes to a vtree can also lead to exponential differences in the size of an SDD.

UAI Conference 2012 Conference Paper

Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference

  • Guy Van den Broeck
  • Arthur Choi
  • Adnan Darwiche

We propose an approach to lifted approximate inference for first-order probabilistic models, such as Markov logic networks. It is based on performing exact lifted inference in a simplified first-order model, which is found by relaxing first-order constraints, and then compensating for the relaxation. These simplified models can be incrementally improved by carefully recovering constraints that have been relaxed, also at the first-order level. This leads to a spectrum of approximations, with lifted belief propagation on one end, and exact lifted inference on the other. We discuss how relaxation, compensation, and recovery can be performed, all at the firstorder level, and show empirically that our approach substantially improves on the approximations of both propositional solvers and lifted belief propagation.

UAI Conference 2012 Conference Paper

New Advances and Theoretical Insights into EDML

  • Khaled S. Refaat
  • Arthur Choi
  • Adnan Darwiche

EDML is a recently proposed algorithm for learning MAP parameters in Bayesian networks. In this paper, we present a number of new advances and insights on the EDML algorithm. First, we provide the multivalued extension of EDML, originally proposed for Bayesian networks over binary variables. Next, we identify a simplified characterization of EDML that further implies a simple fixed-point algorithm for the convex optimization problem that underlies it. This characterization further reveals a connection between EDML and EM: a fixed point of EDML is a fixed point of EM, and vice versa. We thus identify also a new characterization of EM fixed points, but in the semantics of EDML. Finally, we propose a hybrid EDML/EM algorithm that takes advantage of the improved empirical convergence behavior of EDML, while maintaining the monotonic improvement property of EM.

UAI Conference 2011 Conference Paper

EDML: A Method for Learning Parameters in Bayesian Networks

  • Arthur Choi
  • Khaled S. Refaat
  • Adnan Darwiche

We propose a method called EDML for learning MAP parameters in binary Bayesian networks under incomplete data. The method assumes Beta priors and can be used to learn maximum likelihood parameters when the priors are uninformative. EDML exhibits interesting behaviors, especially when compared to EM. We introduce EDML, explain its origin, and study some of its properties both analytically and empirically.

NeurIPS Conference 2009 Conference Paper

Approximating MAP by Compensating for Structural Relaxations

  • Arthur Choi
  • Adnan Darwiche

We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively yields tighter approximations.

UAI Conference 2008 Conference Paper

Approximating the Partition Function by Deleting and then Correcting for Model Edges

  • Arthur Choi
  • Adnan Darwiche

We propose an approach for approximating the partition function which is based on two steps: (1) computing the partition function of a simplified model which is obtained by deleting model edges, and (2) rectifying the result by applying an edge-by-edge correction. The approach leads to an intuitive framework in which one can trade-off the quality of an approximation with the complexity of computing it. It also includes the Bethe free energy approximation as a degenerate case. We develop the approach theoretically in this paper and provide a number of empirical results that reveal its practical utility.

AAAI Conference 2008 Conference Paper

Focusing Generalizations of Belief Propagation on Targeted Queries

  • Arthur Choi

A recent formalization of Iterative Belief Propagation (IBP) has shown that it can be understood as an exact inference algorithm on an approximate model that results from deleting every model edge. This formalization has led to (1) new realizations of Generalized Belief Propagation (GBP) in which edges are recovered incrementally to improve approximation quality, and (2) edgerecovery heuristics that are motivated by improving the approximation quality of all node marginals in a graphical model. In this paper, we propose new edge-recovery heuristics, which are focused on improving the approximations of targeted node marginals. The new heuristics are based on newly-identified properties of edge deletion, and in turn IBP, which guarantee the exactness of edge deletion in simple and idealized cases. These properties also suggest new improvements to IBP approximations which are based on performing edge-by-edge corrections on targeted marginals, which are less costly than improvements based on edge recovery.

AAAI Conference 2008 Conference Paper

Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations

  • Arthur Choi

We consider the problem of computing mutual information between many pairs of variables in a Bayesian network. This task is relevant to a new class of Generalized Belief Propagation (GBP) algorithms that characterizes Iterative Belief Propagation (IBP) as a polytree approximation found by deleting edges in a Bayesian network. By computing, in the simplified network, the mutual information between variables across a deleted edge, we can estimate the impact that recovering the edge might have on the approximation. Unfortunately, it is computationally impractical to compute such scores for networks over many variables having large state spaces. So that edge recovery can scale to such networks, we propose in this paper an approximation of mutual information which is based on a soft extension of dseparation (a graphical test of independence in Bayesian networks). We focus primarily on polytree networks, which are sufficient for the application we consider, although we discuss potential extensions of the approximation to general networks as well. Empirically, we show that our proposal is often as effective as mutual information for the task of edge recovery, with orders of magnitude savings in computation time in larger networks. Our results lead to a concrete realization of GBP, admitting improvements to IBP approximations with only a modest amount of computational effort.

UAI Conference 2007 Conference Paper

Node Splitting: A Scheme for Generating Upper Bounds in Bayesian Networks

  • Arthur Choi
  • Mark Chavira
  • Adnan Darwiche

We formulate in this paper the mini-bucket algorithm for approximate inference in terms of exact inference on an approximate model produced by splitting nodes in a Bayesian network. The new formulation leads to a number of theoretical and practical implications. First, we show that branch-and-bound search algorithms that use mini-bucket bounds may operate in a drastically reduced search space. Second, we show that the proposed formulation inspires new mini-bucket heuristics and allows us to analyze existing heuristics from a new perspective. Finally, we show that this new formulation allows mini-bucket approximations to benefit from recent advances in exact inference, allowing one to significantly increase the reach of these approximations.

UAI Conference 2006 Conference Paper

A Variational Approach for Approximating Bayesian Networks by Edge Deletion

  • Arthur Choi
  • Adnan Darwiche

We consider in this paper the formulation of approximate inference in Bayesian networks as a problem of exact inference on an approximate network that results from deleting edges (to reduce treewidth). We have shown in earlier work that deleting edges calls for introducing auxiliary network parameters to compensate for lost dependencies, and proposed intuitive conditions for determining these parameters. We have also shown that our method corresponds to IBP when enough edges are deleted to yield a polytree, and corresponds to some generalizations of IBP when fewer edges are deleted. In this paper, we propose a different criteria for determining auxiliary parameters based on optimizing the KL-divergence between the original and approximate networks. We discuss the relationship between the two methods for selecting parameters, shedding new light on IBP and its generalizations. We also discuss the application of our new method to approximating inference problems which are exponential in constrained treewidth, including MAP and nonmyopic value of information.

AAAI Conference 2006 Conference Paper

An Edge Deletion Semantics for Belief Propagation and its Practical Impact on Approximation Quality

  • Arthur Choi

We show in this paper that the influential algorithm of iterative belief propagation can be understood in terms of exact inference on a polytree, which results from deleting enough edges from the original network. We show that deleting edges implies adding new parameters into a network, and that the iterations of belief propagation are searching for values of these new parameters which satisfy intuitive conditions that we characterize. The new semantics lead to the following question: Can one improve the quality of approximations computed by belief propagation by recovering some of the deleted edges, while keeping the network easy enough for exact inference? We show in this paper that the answer is yes, leading to another question: How do we choose which edges to recover? To answer, we propose a specific method based on mutual information which is motivated by the edge deletion semantics. Empirically, we provide experimental results showing that the quality of approximations can be improved without incurring much additional computational cost. We also show that recovering certain edges with low mutual information may not be worthwhile as they increase the computational complexity, without necessarily improving the quality of approximations.

UAI Conference 2005 Conference Paper

On Bayesian Network Approximation by Edge Deletion

  • Adnan Darwiche
  • Hei Chan
  • Arthur Choi

We consider the problem of deleting edges from a Bayesian network for the purpose of simplifying models in probabilistic inference. In particular, we propose a new method for deleting network edges, which is based on the evidence at hand. We provide some interesting bounds on the KL-divergence between original and approximate networks, which highlight the impact of given evidence on the quality of approximation and shed some light on good and bad candidates for edge deletion. We finally demonstrate empirically the promise of the proposed edge deletion technique as a basis for approximate inference.