Author name cluster

David Page

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

NeurIPS Conference 2024 Conference Paper

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Boyao Li
Alexander J. Thomson
Houssam Nassif
Matthew M. Engelhard
David Page

Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure. Not only does our research complement existing studies that describe neural networks as kernel machines or infinite-sized Gaussian processes, it also elucidates a more direct approximation that DNNs make to exact inference in PGMs. Potential benefits include improved pedagogy and interpretation of DNNs, and algorithms that can merge the strengths of PGMs and DNNs.

PDF Details DOI

UAI Conference 2023 Conference Paper

Variable importance matching for causal inference

Quinn Lanners
Harsh Parikh
Alexander Volfovsky
Cynthia Rudin
David Page

Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, yield accurate treatment effect estimates, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the auditability of matches, as well as extensions to more general nonparametric outcome modeling.

ICML Conference 2020 Conference Paper

CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods

Wei Zhang 0058
Thomas Kobber Panum
Somesh Jha
Prasad Chalasani
David Page

We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger causality across a wide variety of event sequences with diverse event interdependency. To address these weaknesses, we propose CAUSE (Causality from AttribUtions on Sequence of Events), a novel framework for the studied task. The key idea of CAUSE is to first implicitly capture the underlying event interdependency by fitting a neural point process, and then extract from the process a Granger causality statistic using an axiomatic attribution method. Across multiple datasets riddled with diverse event interdependency, we demonstrate that CAUSE achieves superior performance on correctly inferring the inter-type Granger causality over a range of state-of-the-art methods.

ICML Conference 2019 Conference Paper

AUCμ: A Performance Metric for Multi-Class Machine Learning Models

Ross Kleiman
David Page

The area under the receiver operating characteristic curve (AUC) is arguably the most common metric in machine learning for assessing the quality of a two-class classification model. As the number and complexity of machine learning applications grows, so too does the need for measures that can gracefully extend to classification models trained for more than two classes. Prior work in this area has proven computationally intractable and/or inconsistent with known properties of AUC, and thus there is still a need for an improved multi-class efficacy metric. We provide in this work a multi-class extension of AUC that we call AUC{\textmu} that is derived from first principles of the binary class AUC. AUC{\textmu} has similar computational complexity to AUC and maintains the properties of AUC critical to its interpretation and use.

UAI Conference 2018 Conference Paper

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

Sinong Geng
Zhaobin Kuang
Jie Liu 0006
Stephen J. Wright 0001
David Page

We study the L1 -regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs), where efficient and scalable learning requires both sparse regularization and approximate inference. To address these challenges, we consider a stochastic learning framework called stochastic proximal gradient (SPG; Honorio 2012a, Atchade et al. 2014, Miasojedow and Rejchel 2016). SPG is an inexact proximal gradient algorithm [Schmidt et al. , 2011], whose inexactness stems from the stochastic oracle (Gibbs sampling) for gradient approximation – exact gradient evaluation is infeasible in general due to the NP-hard inference problem for discrete MRFs [Koller and Friedman, 2009]. Theoretically, we provide novel verifiable bounds to inspect and control the quality of gradient approximation. Empirically, we propose the tighten asymptotically (TAY) learning strategy based on the verifiable bounds to boost the performance of SPG.

ICML Conference 2018 Conference Paper

Temporal Poisson Square Root Graphical Models

Sinong Geng
Zhaobin Kuang
Peggy L. Peissig
David Page

We propose temporal Poisson square root graphical models (TPSQRs), a generalization of Poisson square root graphical models (PSQRs) specifically designed for modeling longitudinal event data. By estimating the temporal relationships for all possible pairs of event types, TPSQRs can offer a holistic perspective about whether the occurrences of any given event type could excite or inhibit any other type. A TPSQR is learned by estimating a collection of interrelated PSQRs that share the same template parameterization. These PSQRs are estimated jointly in a pseudo-likelihood fashion, where Poisson pseudo-likelihood is used to approximate the original more computationally intensive pseudo-likelihood problem stemming from PSQRs. Theoretically, we demonstrate that under mild assumptions, the Poisson pseudolikelihood approximation is sparsistent for recovering the underlying PSQR. Empirically, we learn TPSQRs from a real-world large-scale electronic health record (EHR) with millions of drug prescription and condition diagnosis events, for adverse drug reaction (ADR) detection. Experimental results demonstrate that the learned TPSQRs can recover ADR signals from the EHR effectively and efficiently.

NeurIPS Conference 2017 Conference Paper

A Screening Rule for l1-Regularized Ising Model Estimation

Zhaobin Kuang
Sinong Geng
David Page

We discover a screening rule for l1-regularized Ising model estimation. The simple closed-form screening rule is a necessary and sufficient condition for exactly recovering the blockwise structure of a solution under any given regularization parameters. With enough sparsity, the screening rule can be combined with various optimization procedures to deliver solutions efficiently in practice. The screening rule is especially suitable for large-scale exploratory data analysis, where the number of variables in the dataset can be thousands while we are only interested in the relationship among a handful of variables within moderate-size clusters for interpretability. Experimental results on various datasets demonstrate the efficiency and insights gained from the introduction of the screening rule.

IJCAI Conference 2016 Conference Paper

Baseline Regularization for Computational Drug Repositioning with Longitudinal Observational Data

Zhaobin Kuang
James Thomson
Michael Caldwell
Peggy Peissig
Ron Stewart
David Page

Computational Drug Repositioning (CDR) is the knowledge discovery process of finding new indications for existing drugs leveraging heterogeneous drug-related data. Longitudinal observational data such as Electronic Health Records (EHRs) have become an emerging data source for CDR. To address the high-dimensional, irregular, subject and time-heterogeneous nature of EHRs, we propose Baseline Regularization (BR) and a variant that extend the one-way fixed effect model, which is a standard approach to analyze small-scale longitudinal data. For evaluation, we use the proposed methods to search for drugs that can lower Fasting Blood Glucose (FBG) level in the Marshfield Clinic EHR. Experimental results suggest that the proposed methods are capable of rediscovering drugs that can lower FBG level as well as identifying some potential blood sugar lowering drugs in the literature.

JMLR Journal 2016 Journal Article

Structure-Leveraged Methods in Breast Cancer Risk Prediction

Jun Fan
Yirong Wu
Ming Yuan
David Page
Jie Liu
Irene M. Ong
Peggy Peissig
Elizabeth Burnside

Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case- control study, garnering 49 mammography descriptors and 77 high- frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. Structured mammography reports and breast imaging features have long been part of a standard electronic health record (EHR), and genetic markers likely will be in the near future. Lasso and its variants are widely used approaches to integrated learning and feature selection, and our methodological contribution is to incorporate the dependence structure among the features into these approaches. More specifically, we propose a new methodology by combining group penalty and $\ell^p$ ($1\leq p\leq2$) fusion penalty to improve breast cancer risk prediction, taking into account structure information in mammography descriptors and SNPs. We demonstrate that our method provides benefits that are both statistically significant and potentially significant to people's lives. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

ICML Conference 2014 Conference Paper

Multiple Testing under Dependence via Semiparametric Graphical Models

Jie Liu 0006
Chunming Zhang
Elizabeth S. Burnside
David Page

It has been shown that graphical models can be used to leverage the dependence in large-scale multiple testing problems with significantly improved performance (Sun & Cai, 2009; Liu et al. , 2012). These graphical models are fully parametric and require that we know the parameterization of f1, the density function of the test statistic under the alternative hypothesis. However in practice, f1 is often heterogeneous, and cannot be estimated with a simple parametric distribution. We propose a novel semiparametric approach for multiple testing under dependence, which estimates f1 adaptively. This semiparametric approach exactly generalizes the local FDR procedure (Efron et al. , 2001) and connects with the BH procedure (Benjamini & Hochberg, 1995). A variety of simulations show that our semiparametric approach outperforms classical procedures which assume independence and the parametric approaches which capture dependence.

NeurIPS Conference 2013 Conference Paper

Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models

Jie Liu
David Page

In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with stripped Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA's performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs_SBA also generalize better than the models learned by MLE on real-world Senate voting data.

ICML Conference 2012 Conference Paper

Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events

Jesse Davis
Vítor Santos Costa
Elizabeth Berg
David Page
Peggy L. Peissig
Michael Caldwell

UAI Conference 2012 Conference Paper

Graphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies

Jie Liu 0006
Chunming Zhang
Catherine A. McCarty
Peggy L. Peissig
Elizabeth S. Burnside
David Page

Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.

AAAI Conference 2012 Conference Paper

Identifying Adverse Drug Events by Relational Learning

David Page
Vitor Santos Costa
Sriraam Natarajan
Aubrey Barnard
Peggy Peissig
Michael Caldwell

The pharmaceutical industry, consumer protection groups, users of medications and government oversight agencies are all strongly interested in identifying adverse reactions to drugs. While a clinical trial of a drug may use only a thousand patients, once a drug is released on the market it may be taken by millions of patients. As a result, in many cases adverse drug events (ADEs) are observed in the broader population that were not identified during clinical trials. Therefore, there is a need for continued, postmarketing surveillance of drugs to identify previouslyunanticipated ADEs. This paper casts this problem as a reverse machine learning task, related to relational subgroup discovery and provides an initial evaluation of this approach based on experiments with an actual EMR/EHR and known adverse drug events.

NeurIPS Conference 2012 Conference Paper

Multiplicative Forests for Continuous-Time Processes

Jeremy Weiss
Sriraam Natarajan
David Page

Learning temporal dependencies between variables over continuous time is an important and challenging task. Continuous-time Bayesian networks effectively model such processes but are limited by the number of conditional intensity matrices, which grows exponentially in the number of parents per variable. We develop a partition-based representation using regression trees and forests whose parameter spaces grow linearly in the number of node splits. Using a multiplicative assumption we show how to update the forest likelihood in closed form, producing efficient model updates. Our results show multiplicative forests can be learned from few temporal trajectories with large gains in performance and scalability.

ICML Conference 2012 Conference Paper

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Kendrick Boyd
Jesse Davis
David Page
Vítor Santos Costa

JMLR Journal 2009 Journal Article

Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions

Lisa Hellerstein
Bernard Rosell
Eric Bach
Soumya Ray
David Page

A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

IJCAI Conference 2007 Conference Paper

Jesse Davis
Irene Ong
Jan Struyf
Elizabeth Burnside
David Page
V
iacute; tor Santos Costa

Statistical relational learning (SRL) algorithms learn statistical models from relational data, such as that stored in a relational database. We previously introduced view learning for SRL, in which the view of a relational database can be automatically modified, yielding more accurate statistical models. The present paper presents SAYU-VISTA, an algorithm which advances beyond the initial view learning approach in three ways. First, it learns views that introduce new relational tables, rather than merely new fields for an existing table of the database. Second, new tables or new fields are not limited to being approximations to some target concept; instead, the new approach performs a type of predicate invention. The new approach avoids the classical problem with predicate invention, of learning many useless predicates, by keeping only new fields or tables (i. e. , new predicates) that immediately improve the performance of the statistical model. Third, retained fields or tables can then be used in the definitions of further new fields or tables. We evaluate the new view learning approach on three relational classification tasks.

ICML Conference 2007 Conference Paper

An integrated approach to feature invention and model construction for drug activity prediction

Jesse Davis
Vítor Santos Costa
Soumya Ray
David Page

UAI Conference 2007 Conference Paper

Learning Bayesian Network Structure from Correlation-Immune Data

Eric Lantz
Soumya Ray
David Page

Searching the complete space of possible Bayesian networks is intractable for problems of interesting size, so Bayesian network structure learning algorithms, such as the commonly used Sparse Candidate algorithm, employ heuristics. However, these heuristics also restrict the types of relationships that can be learned exclusively from data. They are unable to learn relationships that exhibit "correlation-immunity", such as parity. To learn Bayesian networks in the presence of correlation-immune relationships, we extend the Sparse Candidate algorithm with a technique called "skewing". This technique uses the observation that relationships that are correlation-immune under a specific input distribution may not be correlation-immune under another, sufficiently different distribution. We show that by extending Sparse Candidate with this technique we are able to discover relationships between random variables that are approximately correlation-immune, with a significantly lower computational cost than the alternative of considering multiple parents of a node at a time.

ICML Conference 2005 Conference Paper

Generalized skewing for functions with continuous and nominal attributes

Soumya Ray
David Page

ICML Conference 2005 Conference Paper

Multi-instance tree learning

Hendrik Blockeel
David Page
Ashwin Srinivasan 0001

IJCAI Conference 2005 Conference Paper

View Learning for Statistical Relational Learning: With an Application to Mammography

Jesse Davis
Elizabeth Burnside
Inês Dutra
David Page
Raghu Ramakrishnan
Vítor Santos Costa
Jude

Statistical relational learning (SRL) constructs probabilistic models from relational databases. A key capability of SRL is the learning of arcs (in the Bayes net sense) connecting entries in different rows of a relational table, or in different tables. Nevertheless, SRL approaches currently are constrained to use the existing database schema. For many database applications, users find it profitable to define alternative “views” of the database, in effect defining new fields or tables. Such new fields or tables can also be highly useful in learning. We provide SRL with the capability of learning new views.

ICML Conference 2005 Conference Paper

Why skewing works: learning difficult Boolean functions with greedy tree learners

Bernard Rosell
Lisa Hellerstein
Soumya Ray
David Page

ICML Conference 2004 Conference Paper

Sequential skewing: an improved skewing algorithm

Soumya Ray
David Page

UAI Conference 2003 Conference Paper

CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

Vítor Santos Costa
David Page
Maleeha Qazi
James Cussens

We present CLP(BN), a novel approach that aims at expressing Bayesian networks through the constraint logic programming framework. Arguably, an important limitation of traditional Bayesian networks is that they are propositional, and thus cannot represent relations between multiple similar objects in multiple contexts. Several researchers have thus proposed first-order languages to describe such networks. Namely, one very successful example of this approach are the Probabilistic Relational Models (PRMs), that combine Bayesian networks with relational database technology. The key difficulty that we had to address when designing CLP(cal{BN}) is that logic based representations use ground terms to denote objects. With probabilitic data, we need to be able to uniquely represent an object whose value we are not sure about. We use {sl Skolem functions} as unique new symbols that uniquely represent objects with unknown value. The semantics of CLP(cal{BN}) programs then naturally follow from the general framework of constraint logic programming, as applied to a specific domain where we have probabilistic data. This paper introduces and defines CLP(cal{BN}), and it describes an implementation and initial experiments. The paper also shows how CLP(cal{BN}) relates to Probabilistic Relational Models (PRMs), Ngo and Haddawys Probabilistic Logic Programs, AND Kersting AND De Raedts Bayesian Logic Programs

JMLR Journal 2003 Journal Article

ILP: A Short Look Back and a Longer Look Forward

David Page
Ashwin Srinivasan

Inductive logic programming (ILP) is built on a foundation laid by research in machine learning and computational logic. Armed with this strong foundation, ILP has been applied to important and interesting problems in the life sciences, engineering and the arts. This paper begins by briefly reviewing some example applications, in order to illustrate the benefits of ILP. In turn, the applications have brought into focus the need for more research into specific topics. We enumerate and elaborate five of these: (1) novel search methods; (2) incorporation of explicit probabilities; (3) incorporation of special-purpose reasoners; (4) parallel execution using commodity components; and (5) enhanced human interaction. It is our hypothesis that progress in each of these areas can greatly improve the contributions that can be made with ILP; and that, with assistance from research workers in other areas, significant progress in each of these areas is possible. [abs] [ pdf ][ ps.gz ][ ps ]

IJCAI Conference 2003 Conference Paper

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction

David Page
Soumya Ray

This paper presents a novel, promising approach that allows greedy decision tree induction algorithms to handle problematic functions such as parity functions. Lookahead is the standard approach to addressing difficult functions for greedy decision tree learners. Nevertheless, this approach is limited to very small problematic functions or subfunctions (2 or 3 variables), because the time complexity grows more than exponentially with the depth of lookahead. In contrast, the approach presented in this paper carries only a constant run-time penalty. Experiments indicate that the approach is effective with only modest amounts of data for problematic functions or subfunctions of up to six or seven variables, where the examples themselves may contain numerous other (irrelevant) variables as well.

ICML Conference 2001 Conference Paper

Multiple Instance Regression

Soumya Ray
David Page

ICML Conference 2000 Conference Paper

Using Multiple Levels of Learning and Diverse Evidence to Uncover Coordinately Controlled Genes

Mark W. Craven
David Page
Jude W. Shavlik
Joseph Bockhorst
Jeremy D. Glasner