Author name cluster

Larry Wasserman

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

1 author row

JMLR Journal 2024 Journal Article

Decorrelated Variable Importance

Isabella Verdinelli
Larry Wasserman

Because of the widespread use of black box prediction methods such as random forests and neural nets, there is renewed interest in developing methods for quantifying variable importance as part of the broader goal of interpretable prediction. A popular approach is to define a variable importance parameter --- known as LOCO (Leave Out COvariates) --- based on dropping covariates from a regression model. This is essentially a nonparametric version of $R^2$. This parameter is very general and can be estimated nonparametrically, but it can be hard to interpret because it is affected by correlation between covariates. We propose a method for mitigating the effect of correlation by defining a modified version of LOCO. This new parameter is difficult to estimate nonparametrically, but we show how to estimate it using semiparametric models. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

PLLay: Efficient Topological Layer based on Persistent Landscapes

Kwangho Kim
Jisu Kim
Manzil Zaheer
Joon Kim
Frederic Chazal
Larry Wasserman

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure. In this work, we show differentiability with respect to layer inputs, for a general persistent homology with arbitrary filtration. Thus, our proposed layer can be placed anywhere in the network and feed critical information on the topological features of input data into subsequent layers to improve the learnability of the networks toward a given task. A task-optimal structure of PLLay is learned during training via backpropagation, without requiring any input featurization or data preprocessing. We provide a novel adaptation for the DTM function-based filtration, and show that the proposed layer is robust against noise and outliers through a stability analysis. We demonstrate the effectiveness of our approach by classification experiments on various datasets.

JMLR Journal 2018 Journal Article

Robust Topological Inference: Distance To a Measure and Kernel Distance

Frédéric Chazal
Brittany Fasy
Fabrizio Lecci
Bertrand Michel
Alessandro Rinaldo
Larry Wasserman

Let $P$ be a distribution with support $S$. The salient features of $S$ can be quantified with persistent homology, which summarizes topological features of the sublevel sets of the distance function (the distance of any point $x$ to $S$). Given a sample from $P$ we can infer the persistent homology using an empirical version of the distance function. However, the empirical distance function is highly non-robust to noise and outliers. Even one outlier is deadly. The distance-to-a-measure (DTM), introduced by \cite{chazal2011geometric}, and the kernel distance, introduced by \cite{phillips2014goemetric}, are smooth functions that provide useful topological information but are robust to noise and outliers. \cite{massart2014} derived concentration bounds for DTM. Building on these results, we derive limiting distributions and confidence sets, and we propose a method for choosing tuning parameters. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

NeurIPS Conference 2016 Conference Paper

Statistical Inference for Cluster Trees

Jisu Kim
Yen-Chi Chen
Sivaraman Balakrishnan
Alessandro Rinaldo
Larry Wasserman

A cluster tree provides an intuitive summary of a density function that reveals essential structure about the high-density clusters. The true cluster tree is estimated from a finite sample from an unknown true density. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of different features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyzing their properties and assessing their suitability for our inference task. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we provide a variety of simulations to illustrate our proposed methods and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

NeurIPS Conference 2015 Conference Paper

Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations

Kirthevasan Kandasamy
Akshay Krishnamurthy
Barnabas Poczos
Larry Wasserman
james robins

We propose and analyse estimators for statistical functionals of one or moredistributions under nonparametric assumptions. Our estimators are derived from the von Mises expansion andare based on the theory of influence functions, which appearin the semiparametric statistics literature. We show that estimators based either on data-splitting or a leave-one-out techniqueenjoy fast rates of convergence and other favorable theoretical properties. We apply this framework to derive estimators for several popular informationtheoretic quantities, and via empirical evaluation, show the advantage of thisapproach over existing estimators.

IS Journal 2015 Journal Article

On Clinical Pathway Discovery from Electronic Health Record Data

Yiye Zhang
Rema Padman
Larry Wasserman
Nirav Patel
Pradip Teredesai
Qizhi Xie

In this column, the authors propose a practice-based clinical pathway development process using patient data from electronic health records. The authors aim to examine the interaction between patients' health conditions and treatment approaches, and learn the most common pathways of care to inform better patient engagement and outcomes in healthcare practices.

AAAI Conference 2015 Conference Paper

On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions

Aaditya Ramdas
Sashank Jakkam Reddi
Barnabas Poczos
Aarti Singh
Larry Wasserman

This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of ”fair” alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the median heuristic for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonparametric hypothesis tests in high dimensions.

NeurIPS Conference 2015 Conference Paper

Optimal Ridge Detection using Coverage Risk

Yen-Chi Chen
Christopher Genovese
Shirley Ho
Larry Wasserman

We introduce the concept of coverage risk as an error measure for density ridge estimation. The coverage risk generalizes the mean integrated square error to set estimation. We propose two risk estimators for the coverage risk and we show that we can select tuning parameters by minimizing the estimated risk. We study the rate of convergence for coverage risk and prove consistency of the risk estimators. We apply our method to three simulated datasets and to cosmology data. In all the examples, the proposed method successfully recover the underlying density structure.

JMLR Journal 2014 Journal Article

Statistical Analysis of Metric Graph Reconstruction

Fabrizio Lecci
Alessandro Rinaldo
Larry Wasserman

A metric graph is a 1-dimensional stratified metric space consisting of vertices and edges or loops glued together. Metric graphs can be naturally used to represent and model data that take the form of noisy filamentary structures, such as street maps, neurons, networks of rivers and galaxies. We consider the statistical problem of reconstructing the topology of a metric graph embedded in $\mathbb{R}^D$ from a random sample. We derive lower and upper bounds on the minimax risk for the noiseless case and tubular noise case. The upper bound is based on the reconstruction algorithm given in Aanjaneya et al. (2012). [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

NeurIPS Conference 2013 Conference Paper

Cluster Trees on Manifolds

Sivaraman Balakrishnan
Srivatsan Narayanan
Alessandro Rinaldo
Aarti Singh
Larry Wasserman

We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. We study a $k$-nearest neighbor based algorithm recently proposed by Chaudhuri and Dasgupta. Under mild assumptions on $f$ and $M$, we obtain rates of convergence that depend on $d$ only but not on the ambient dimension $D$. We also provide a sample complexity lower bound for a natural class of clustering algorithms that use $D$-dimensional neighborhoods.

JMLR Journal 2013 Journal Article

Differential Privacy for Functions and Functional Data

Rob Hall
Alessandro Rinaldo
Larry Wasserman

Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

NeurIPS Conference 2013 Conference Paper

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Martin Azizyan
Aarti Singh
Larry Wasserman

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering.

JMLR Journal 2012 Journal Article

A Comparison of the Lasso and Marginal Regression

Christopher R. Genovese
Jiashun Jin
Larry Wasserman
Zhigang Yao

The lasso is an important method for sparse, high-dimensional regression problems, with efficient algorithms available, a long history of practical success, and a large body of theoretical results supporting and explaining its performance. But even with the best available algorithms, finding the lasso solutions remains a computationally challenging task in cases where the number of covariates vastly exceeds the number of data points. Marginal regression, where each dependent variable is regressed separately on each covariate, offers a promising alternative in this case because the estimates can be computed roughly two orders faster than the lasso solutions. The question that remains is how the statistical performance of the method compares to that of the lasso in these cases. In this paper, we study the relative statistical performance of the lasso and marginal regression for sparse, high-dimensional regression problems. We consider the problem of learning which coefficients are non-zero. Our main results are as follows: (i) we compare the conditions under which the lasso and marginal regression guarantee exact recovery in the fixed design, noise free case; (ii) we establish conditions under which marginal regression provides exact recovery with high probability in the fixed design, noise free, random coefficients case; and (iii) we derive rates of convergence for both procedures, where performance is measured by the number of coefficients with incorrect sign, and characterize the regions in the parameter space recovery is and is not possible under this metric. In light of the computational advantages of marginal regression in very high dimensional problems, our theoretical and simulations results suggest that the procedure merits further study. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

NeurIPS Conference 2012 Conference Paper

Exponential Concentration for Mutual Information Estimation with Application to Forests

Han Liu
Larry Wasserman
John Lafferty

We prove a new exponential concentration inequality for a plug-in estimator of the Shannon mutual information. Previous results on mutual information estimation only bounded expected error. The advantage of having the exponential inequality is that, combined with the union bound, we can guarantee accurate estimators of the mutual information for many pairs of random variables simultaneously. As an application, we show how to use such a result to optimally estimate the density function and graph of a distribution which is Markov to a forest graph.

JMLR Journal 2012 Journal Article

Minimax Manifold Estimation

Christopher Genovese
Marco Perone-Pacifico
Isabella Verdinelli
Larry Wasserman

We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in ℝ D given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n -2/(2+d). Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2012 Journal Article

Stability of Density-Based Clustering

Alessandro Rinaldo
Aarti Singh
Rebecca Nugent
Larry Wasserman

High density clusters can be characterized by the connected components of a level set L(λ) = {x: p(x)>λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T= ∪ λ L(λ). In this paper, we study the behavior of a density level set estimate L̂(λ) and cluster tree estimate T̂ based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L̂(λ) and T̂ as a function of h, and investigate the theoretical properties of these instability measures. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2012 Journal Article

The huge Package for High-dimensional Undirected Graph Estimation in R

Tuo Zhao
Han Liu
Kathryn Roeder
John Lafferty
Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2011 Journal Article

Forest Density Estimation

Han Liu
Min Xu
Haijie Gu
Anupam Gupta
John Lafferty
Larry Wasserman

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

JMLR Journal 2011 Journal Article

Union Support Recovery in Multi-task Learning

Mladen Kolar
John Lafferty
Larry Wasserman

We sharply characterize the performance of different penalization schemes for the problem of selecting the relevant variables in the multi-task setting. Previous work focuses on the regression problem where conditions on the design matrix complicate the analysis. A clearer and simpler picture emerges by studying the Normal means model. This model, often used in the field of statistics, is a simplified model that provides a laboratory for studying complex procedures. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

NeurIPS Conference 2010 Conference Paper

Graph-Valued Regression

Han Liu
Xi Chen
Larry Wasserman
John Lafferty

Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression''. In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART'', or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.

NeurIPS Conference 2010 Conference Paper

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Han Liu
Kathryn Roeder
Larry Wasserman

A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i. e. with high probability, all the true edges will be included in the selected model even when the graph size asymptotically increases with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all competing procedures.

JMLR Journal 2009 Journal Article

The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

Han Liu
John Lafferty
Larry Wasserman

Recent methods for estimating sparse undirected graphs for real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula---or "nonparanormal"---for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of one-dimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method's theoretical properties, and show that it works well in many examples. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

NeurIPS Conference 2008 Conference Paper

Nonparametric regression and classification with joint sparsity constraints

Han Liu
Larry Wasserman
John Lafferty

We propose new families of models and algorithms for high-dimensional nonparametric learning with joint sparsity constraints. Our approach is based on a regularization method that enforces common sparsity patterns across different function components in a nonparametric additive model. The algorithms employ a coordinate descent approach that is based on a functional soft-thresholding operator. The framework yields several new models, including multi-task sparse additive models, multi-response sparse additive models, and sparse additive multi-category logistic regression. The methods are illustrated with experiments on synthetic data and gene microarray data.

NeurIPS Conference 2007 Conference Paper

Compressed Regression

Shuheng Zhou
Larry Wasserman
John Lafferty

Recent research has studied the role of sparsity in high dimensional regression and signal reconstruction, establishing theoretical limits for recovering sparse models from sparse data. In this paper we study a variant of this problem where the original $n$ input variables are compressed by a random linear transformation to $m \ll n$ examples in $p$ dimensions, and establish conditions under which a sparse linear model can be successfully recovered from the compressed data. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We characterize the number of random projections that are required for $\ell_1$-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one, a property called ``sparsistence. '' In addition, we show that $\ell_1$-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called ``persistence. '' Finally, we characterize the privacy properties of the compression procedure in information-theoretic terms, establishing upper bounds on the rate of information communicated between the compressed and uncompressed data that decay to zero.

NeurIPS Conference 2007 Conference Paper

SpAM: Sparse Additive Models

Han Liu
Larry Wasserman
John Lafferty
Pradeep Ravikumar

We present a new class of models for high-dimensional nonparametric regression and classiﬁcation called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We de- rive a method for ﬁtting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, show- ing that SpAM can be effective in ﬁtting sparse nonparametric models in high dimensional data.

NeurIPS Conference 2007 Conference Paper

Statistical Analysis of Semi-Supervised Regression

Larry Wasserman
John Lafferty

Semi-supervised methods use unlabeled data in addition to labeled data to con- struct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our ﬁrst result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of con- vergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.

NeurIPS Conference 2005 Conference Paper

Active Learning For Identifying Function Threshold Boundaries

Brent Bryan
Robert Nichol
Christopher Genovese
Jeff Schneider
Christopher Miller
Larry Wasserman

We present an efﬁcient algorithm to actively select queries for learning the boundaries separating a function domain into regions where the func- tion is above and below a given threshold. We develop experiment selec- tion methods based on entropy, misclassiﬁcation rate, variance, and their combinations, and show how they perform on a number of data sets. We then show how these algorithms are used to determine simultaneously valid 1 − α conﬁdence intervals for seven cosmological parameters. Ex- perimentation shows that the algorithm reduces the computation neces- sary for the parameter estimation problem by an order of magnitude.

NeurIPS Conference 2005 Conference Paper

Rodeo: Sparse Nonparametric Regression in High Dimensions

Larry Wasserman
John Lafferty

We present a method for nonparametric regression that performs bandwidth selection and variable selection simultaneously. The approach is based on the technique of incrementally decreasing the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. When the unknown function satisfies a sparsity condition, our approach avoids the curse of dimensionality, achieving the optimal minimax rate of convergence, up to logarithmic factors, as if the relevant variables were known in advance. The method--called rodeo (regularization of derivative expectation operator)--conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces hard with soft thresholding effectively solves a sequence of lasso problems.