Arrow Research search

Author name cluster

Dan Geiger

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

UAI Conference 2007 Conference Paper

Importance Sampling via Variational Optimization

  • Ydo Wexler
  • Dan Geiger

Computing the exact likelihood of data in large Bayesian networks consisting of thousands of vertices is often a difficult task. When these models contain many deterministic conditional probability tables and when the observed values are extremely unlikely even alternative algorithms such as variational methods and stochastic sampling often perform poorly. We present a new importance sampling algorithm for Bayesian networks which is based on variational techniques. We use the updates of the importance function to predict whether the stochastic sampling converged above or below the true likelihood, and change the proposal distribution accordingly. The validity of the method and its contribution to convergence is demonstrated on hard networks of large genetic linkage analysis tasks.

JMLR Journal 2005 Journal Article

Asymptotic Model Selection for Naive Bayesian Networks

  • Dmitry Rusakov
  • Dan Geiger

We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally incorrect for statistical models that belong to stratified exponential families. This claim stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct asymptotic approximation for the marginal likelihood. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )

UAI Conference 2003 Conference Paper

A Distance-Based Branch and Bound Feature Selection Algorithm

  • Ari Frank
  • Dan Geiger
  • Zohar Yakhini

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling

UAI Conference 2003 Conference Paper

Automated Analytic Asymptotic Evaluation of the Marginal Likelihood for Latent Models

  • Dmitry Rusakov
  • Dan Geiger

We present and implement two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard for latent Bayesian network models, namely networks that include hidden variables, where asymptotic approximation deviates from the standard BIC score. Our algorithms solve two central difficulties in asymptotic evaluation of marginal likelihood integrals, namely, evaluation of regular dimensionality drop for latent Bayesian network models and computation of non-standard approximation formulas for singular statistics for these models. The presented algorithms are implemented in Matlab and Maple and their usage is demonstrated for marginal likelihood approximations for Bayesian networks with hidden variables

UAI Conference 2002 Conference Paper

Asymptotic Model Selection for Naive Bayesian Networks

  • Dmitry Rusakov
  • Dan Geiger

We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally not valid for statistical models that belong to a stratified exponential family. This stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct approximation for the marginal likelihood.

UAI Conference 2002 Conference Paper

Factorization of Discrete Probability Distributions

  • Dan Geiger
  • Christopher Meek
  • Bernd Sturmfels

We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. This result generalizes the well known Hammersley-Clifford Theorem.

AIJ Journal 2001 Journal Article

A sufficiently fast algorithm for finding close to optimal clique trees

  • Ann Becker
  • Dan Geiger

We offer an algorithm that finds a clique tree such that the size of the largest clique is at most (2α+1)k where k is the size of the largest clique in a clique tree in which this size is minimized and α is the approximation ratio of an α-approximation algorithm for the 3-way vertex cut problem. When α=4/3, our algorithm's complexity is O(24. 67kn·poly(n)) and it errs by a factor of 3. 67 where poly(n) is the running time of linear programming. This algorithm is extended to find clique trees in which the state space of the largest clique is bounded. When k=O(logn), our algorithm yields a polynomial inference algorithm for Bayesian networks.

UAI Conference 2000 Conference Paper

Perfect Tree-like Markovian Distributions

  • Ann Becker
  • Dan Geiger
  • Christopher Meek

We show that if a strictly positive joint probability distribution for a set of binary random variables factors according to a tree, then vertex separation represents all and only the independence relations enclosed in the distribution. The same result is shown to hold also for multivariate strictly positive normal distributions. Our proof uses a new property of conditional independence that holds for these two classes of probability distributions.

UAI Conference 1999 Conference Paper

Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions

  • Dan Geiger
  • David Heckerman

We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normal-Wishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n x n, n >= 3, positive-definite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W_{11}-W_{12}W_{22}^{-1}W_{12}' is independent of {W_{12}, W_{22}} for every block partitioning W_{11}, W_{12}, W_{12}', W_{22} of W. Similar characterizations of the normal and normal-Wishart distributions are provided as well. We also show how to construct a prior for every DAG model over X from the prior of a single regression model.

UAI Conference 1999 Conference Paper

Quantifier Elimination for Statistical Problems

  • Dan Geiger
  • Christopher Meek

Recent improvement on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphyical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determing whether a given set of independence assertions. We discuss the foundation of quantifier elimination and demonstrate its application to these problems.

UAI Conference 1999 Conference Paper

Random Algorithms for the Loop Cutset Problem

  • Ann Becker
  • Reuven Bar-Yehuda
  • Dan Geiger

We show how to find a minimum loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in Pearl's method of conditioning for inference. Our random algorithm for finding a loop cutset, called ``Repeated WGuessI", outputs a minimum loop cutset, after O(c 6^k k n) steps, with probability at least 1-(1 over{6^k})^{c 6^k}), where c>1 is a constant specified by the user, k is the size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm, called WRA, often finds a loop cutset that is closer to the minimum loop cutset than the ones found by the best deterministic algorithms known.

UAI Conference 1998 Conference Paper

Graphical Models and Exponential Families

  • Dan Geiger

We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and non-independence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined.

UAI Conference 1996 Conference Paper

A sufficiently fast algorithm for finding close to optimal junction trees

  • Ann Becker
  • Dan Geiger

An algorithm is developed for finding a close to optimal junction tree of a given graph G. The algorithm has a worst case complexity O(c^k n^a) where a and c are constants, n is the number of vertices, and k is the size of the largest clique in a junction tree of G in which this size is minimized. The algorithm guarantees that the logarithm of the size of the state space of the heaviest clique in the junction tree produced is less than a constant factor off the optimal value. When k = O(log n), our algorithm yields a polynomial inference algorithm for Bayesian networks.

UAI Conference 1996 Conference Paper

Asymptotic Model Selection for Directed Networks with Hidden Variables

  • Dan Geiger
  • David Heckerman
  • Christopher Meek

We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node.

AIJ Journal 1996 Journal Article

Knowledge representation and inference in similarity networks and Bayesian multinets

  • Dan Geiger
  • David Heckerman

We examine two representation schemes for uncertain knowledge: the similarity network (Heckerman, 1991) and the Bayesian multinet. These schemes are extensions of the Bayesian network model in that they represent asymmetric independence assertions. We explicate the notion of relevance upon which similarity networks are based and present an efficient inference algorithm that works under the assumption that every event has a nonzero probability. Another inference algorithm is developed that works under no restriction albeit less efficiently. We show that similarity networks are not inferentially complete—namely—not every query can be answered. Nonetheless, we show that a similarity network can always answer any query of the form: “What is the posterior probability of an hypothesis given evidence? ” We call this property diagnostic completeness. Finally, we describe a generalization of similarity networks that can encode more types of asymmetric conditional independence assertions than can ordinary similarity networks.

AIJ Journal 1996 Journal Article

Optimization of Pearl's method of conditioning and greedy-like approximation algorithms for the vertex feedback set problem

  • Ann Becker
  • Dan Geiger

We show how to find a small loop cutset in a Bayesian network. Finding such a loop cutset is the first step in the method of conditioning for inference. Our algorithm for finding a loop cutset, called MGA, finds a loop cutset which is guaranteed in the worst case to contain less than twice the number of variables contained in a minimum loop cutset. The algorithm is based on a reduction to the weighted vertex feedback set problem and a 2-approximation of the latter problem. The complexity of MGA is O(m + n log n) where m and n are the number of edges and vertices respectively. A greedy algorithm, called GA, for the weighted vertex feedback set problem is also analyzed and bounds on its performance are given. We test MGA on randomly generated graphs and find that the average ratio between the number of instances associated with the algorithm's output and the number of instances associated with an optimum solution is far better than the worst-case bound.

UAI Conference 1995 Conference Paper

Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains

  • David Heckerman
  • Dan Geiger

We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains. We derive a general Bayesian scoring metric, appropriate for both domains. We then use this metric in combination with well-known statistical facts about the Dirichlet and normal--Wishart distributions to derive our metrics for discrete and Gaussian domains.

UAI Conference 1994 Conference Paper

Approximation Algorithms for the Loop Cutset Problem

  • Ann Becker
  • Dan Geiger

We show how to find a small loop curser in a Bayesian network. Finding such a loop cutset is the first step in the method of conditioning for inference. Our algorithm for finding a loop cutset, called MGA, finds a loop cutset which is guaranteed in the worst case to contain less than twice the number of variables contained in a minimum loop cutset. We test MGA on randomly generated graphs and find that the average ratio between the number of instances associated with the algorithms' output and the number of instances associated with a minimum solution is 1.22.

UAI Conference 1994 Conference Paper

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

  • David Heckerman
  • Dan Geiger
  • Max Chickering

We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Our contributions are threefold. First, we identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user's prior knowledge. In particular, a user can express her knowledge-for the most part-as a single prior Bayesian network for the domain. Second, we describe local search and annealing algorithms to be used in conjunction with scoring metrics. In the special case where each node has at most one parent, we show that heuristic search can be replaced with a polynomial algorithm to identify the networks with the highest score. Third, we describe a methodology for evaluating Bayesian-network learning algorithms. We apply this approach to a comparison of metrics and search procedures.

UAI Conference 1994 Conference Paper

Learning Gaussian Networks

  • Dan Geiger
  • David Heckerman

We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Previous work has concentrated on metrics for domains containing only discrete variables, under the assumption that data represents a multinomial sample. In this paper, we extend this work, developing scoring metrics for domains containing all continuous variables or a mixture of discrete and continuous variables, under the assumption that continuous data is sampled from a multivariate normal distribution. Our work extends traditional statistical approaches for identifying vanishing regression coefficients in that we identify two important assumptions, called event equivalence and parameter modularity, that when combined allow the construction of prior distributions for multivariate normal parameters from a single prior Bayesian network specified by a user.

UAI Conference 1994 Conference Paper

On Testing Whether an Embedded Bayesian Network Represents a Probability Model

  • Dan Geiger
  • Azaria Paz
  • Judea Pearl

Testing the validity of probabilistic models containing unmeasured (hidden) variables is shown to be a hard task. We show that the task of testing whether models are structurally incompatible with the data at hand, requires an exponential number of independence evaluations, each of the form: "X is conditionally independent of Y, given Z." In contrast, a linear number of such evaluations is required to test a standard Bayesian network (one per vertex). On the positive side, we show that if a network with hidden variables G has a tree skeleton, checking whether G represents a given probability model P requires the polynomial number of such independence evaluations. Moreover, we provide an algorithm that efficiently constructs a tree-structured Bayesian network (with hidden variables) that represents P if such a network exists, and further recognizes when such a network does not exist.

UAI Conference 1993 Conference Paper

Inference Algorithms for Similarity Networks

  • Dan Geiger
  • David Heckerman

We examine two types of similarity networks each based on a distinct notion of relevance. For both types of similarity networks we present an efficient inference algorithm that works under the assumption that every event has a nonzero probability of occurrence. Another inference algorithm is developed for type 1 similarity networks that works under no restriction, albeit less efficiently.

UAI Conference 1992 Conference Paper

An Entropy-based Learning Algorithm of Bayesian Conditional Trees

  • Dan Geiger

This article offers a modification of Chow and Liu's learning algorithm in the context of handwritten digit recognition. The modified algorithm directs the user to group digits into several classes consisting of digits that are hard to distinguish and then constructing an optimal conditional tree representation for each class of digits instead of for each single digit as done by Chow and Liu (1968). Advantages and extensions of the new method are discussed. Related works of Wong and Wang (1977) and Wong and Poon (1989) which offer a different entropy-based learning algorithm are shown to rest on inappropriate assumptions.

UAI Conference 1991 Conference Paper

Advances in Probabilistic Reasoning

  • Dan Geiger
  • David Heckerman

This paper discuses multiple Bayesian networks representation paradigms for encoding asymmetric independence assertions. We offer three contributions: (1) an inference mechanism that makes explicit use of asymmetric independence to speed up computations, (2) a simplified definition of similarity networks and extensions of their theory, and (3) a generalized representation scheme that encodes more types of asymmetric independence assertions than do similarity networks.

AAAI Conference 1991 Conference Paper

Optimal Satisficing Tree Searches

  • Dan Geiger

We provide an algorithm that finds optimal search strategies for AND trees and OR trees. Our model includes three outcomes when a node is explored: (1) finding a solution, (2) not finding a solution and realizing that there are no solutions beneath the current node (pruning), and (3) not finding a solution but not pruning the nodes below. The expected cost of examining a node and the probabilities of the three outcomes are given. Based on this input, the algorithm generates an order that minimizes the expected search cost.

AAAI Conference 1990 Conference Paper

Learning Causal Trees from Dependence Information

  • Dan Geiger

In constructing probabilistic networks from human judgments, we use causal relationships to convey useful patterns of dependencies. The converse task, that of inferring causal relationships from patterns of dependencies, is far less understood. Th’ 1s paper establishes conditions under which the directionality of some interactions can be determined from non-temporal probabilistic information - an essential prerequisite for attributing a causal interpretation to these interactions. An efficient algorithm is developed that, given data generated by an undisclosed causal polytree, recovers the structure of the underlying polytree, as well as the directionality of all its identifiable links.

UAI Conference 1990 Conference Paper

separable and transitive graphoids

  • Dan Geiger
  • David Heckerman

We examine three probabilistic formulations of the sentence a and b are totally unrelated with respect to a given set of variables U. First, two variables a and b are totally independent if they are independent given any value of any subset of the variables in U. Second, two variables are totally uncoupled if U can be partitioned into two marginally independent sets containing a and b respectively. Third, two variables are totally disconnected if the corresponding nodes are disconnected in every belief network representation. We explore the relationship between these three formulations of unrelatedness and explain their relevance to the process of acquiring probabilistic knowledge from human experts.

UAI Conference 1989 Conference Paper

d-Separation: From Theorems to Algorithms

  • Dan Geiger
  • Thomas Verma
  • Judea Pearl

An efficient algorithm is developed that identifies all independencies implied by the topology of a Bayesian network. Its correctness and maximality stems from the soundness and completeness of d-separation with respect to probability theory. The algorithm runs in time O ( l E l ) where E is the number of edges in the network.