Arrow Research search

Author name cluster

Sanjoy Dasgupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers
2 author rows

Possible papers

51

NeurIPS Conference 2025 Conference Paper

Consistency of the $k_n$-nearest neighbor rule under adaptive sampling

  • Robi Bhattacharjee
  • Geelon So
  • Sanjoy Dasgupta

In the adaptive sampling model of online learning, future prediction tasks can be arbitrarily dependent on the past. Every round, an adversary selects an instance to test the learner. After the learner makes a prediction, a noisy label is drawn from an underlying conditional label distribution and is revealed to both learner and adversary. A learner is consistent if it eventually performs no worse than the Bayes predictor. We study the $k_n$-nearest neighbor learner within this setting. In the worst-case, the learner will fail because an adaptive process can generate spurious patterns out of noise. However, under the mild smoothing assumption that the process generating the instances is uniformly absolutely continuous and that choice of $(k_n)_n$ is reasonable, the $k_n$-nearest neighbor rule is online consistent.

NeurIPS Conference 2025 Conference Paper

Low Precision Streaming PCA

  • Sanjoy Dasgupta
  • Syamantak Kumar
  • Shourya Pandey
  • Purnamrita Sarkar

Low-precision Streaming PCA estimates the top principal component in a streaming setting under limited precision. We establish an information‐theoretic lower bound on the quantization resolution required to achieve a target accuracy for the leading eigenvector. We study Oja's algorithm for streaming PCA under linear and nonlinear stochastic quantization. The quantized variants use unbiased stochastic quantization of the weight vector and the updates. Under mild moment and spectral-gap assumptions on the data distribution, we show that a batched version achieves the lower bound up to logarithmic factors under both schemes. This leads to a nearly dimension-free quantization error in the nonlinear quantization setting. Empirical evaluations on synthetic streams validate our theoretical findings and demonstrate that our low-precision methods closely track the performance of standard Oja’s algorithm.

UAI Conference 2024 Conference Paper

Convergence Behavior of an Adversarial Weak Supervision Method

  • Steven An 0002
  • Sanjoy Dasgupta

Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results.

ICML Conference 2024 Conference Paper

New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering

  • Sanjoy Dasgupta
  • Eduardo Sany Laber

Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance, the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.

NeurIPS Conference 2024 Conference Paper

Online Consistency of the Nearest Neighbor Rule

  • Sanjoy Dasgupta
  • Geelon So

In the realizable online setting, a learner is tasked with making predictions for a stream of instances, where the correct answer is revealed after each prediction. A learning rule is online consistent if its mistake rate eventually vanishes. The nearest neighbor rule is fundamental prediction strategy, but it is only known to be consistent under strong statistical or geometric assumptions: the instances come i. i. d. or the label classes are well-separated. We prove online consistency for all measurable functions in doubling metric spaces under the mild assumption that instances are generated by a process that is uniformly absolutely continuous with respect to an underlying finite, upper doubling measure.

ICML Conference 2023 Conference Paper

Data-Copying in Generative Models: A Formal Framework

  • Robi Bhattacharjee
  • Sanjoy Dasgupta
  • Kamalika Chaudhuri

There has been some recent interest in detecting and addressing memorization of training data by deep neural networks. A formal framework for memorization in generative models, called “data-copying” was proposed by Meehan et. al (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. Motivated by this and the theory of non-parametric methods, we provide an alternative definition of data-copying that applies more locally. We provide a method to detect data-copying, and provably show that it works with high probability when enough data is available. We also provide lower bounds that characterize the sample requirement for reliable detection.

IJCAI Conference 2022 Conference Paper

A Theoretical Perspective on Hyperdimensional Computing (Extended Abstract)

  • Anthony Thomas
  • Sanjoy Dasgupta
  • Tajana Rosing

Hyperdimensional (HD) computing is a set of neurally inspired methods for computing on high-dimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. We present a novel mathematical framework that unifies analysis of HD computing architectures, and provides general, non-asymptotic, sufficient conditions under which HD information processing techniques will succeed.

ICML Conference 2022 Conference Paper

Constants Matter: The Performance Gains of Active Learning

  • Stephen Mussmann
  • Sanjoy Dasgupta

Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.

ICML Conference 2022 Conference Paper

Framework for Evaluating Faithfulness of Local Explanations

  • Sanjoy Dasgupta
  • Nave Frost
  • Michal Moshkovitz

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

JAIR Journal 2021 Journal Article

A Theoretical Perspective on Hyperdimensional Computing

  • Anthony Thomas
  • Sanjoy Dasgupta
  • Tajana Rosing

Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining highdimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this review, we present a unified treatment of the theoretical foundations of HD computing with a focus on the suitability of representations for learning.

ICML Conference 2020 Conference Paper

Explainable k-Means and k-Medians Clustering

  • Michal Moshkovitz
  • Sanjoy Dasgupta
  • Cyrus Rashtchian
  • Nave Frost

Many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a complicated way. To improve interpretability, we consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We study this problem from a theoretical viewpoint, measuring cluster quality by the k-means and k-medians objectives. In terms of negative results, we show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and any clustering based on a tree with k leaves must incur an Omega(log k) approximation factor compared to the optimal clustering. On the positive side, for two means/medians, we show that a single threshold cut can achieve a constant factor approximation, and we give nearly-matching lower bounds; for general k > 2, we design an efficient algorithm that leads to an O(k) approximation to the optimal k-medians and an O(k^2) approximation to the optimal k-means. Prior to our work, no algorithms were known with provable guarantees independent of dimension and input size.

NeurIPS Conference 2019 Conference Paper

An adaptive nearest neighbor rule for classification

  • Akshay Balsubramani
  • Sanjoy Dasgupta
  • Yoav Freund
  • Shay Moran

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of points in noisy regions. ) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, $k$-NN with an optimal choice of $k$. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the ``advantage'' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest.

ICML Conference 2019 Conference Paper

Teaching a black-box learner

  • Sanjoy Dasgupta
  • Daniel J. Hsu
  • Stefanos Poulis
  • Xiaojin Zhu 0001

One widely-studied model of teaching calls for a teacher to provide the minimal set of labeled examples that uniquely specifies a target concept. The assumption is that the teacher knows the learner’s hypothesis class, which is often not true of real-life teaching scenarios. We consider the problem of teaching a learner whose representation and hypothesis class are unknown—that is, the learner is a black box. We show that a teacher who does not interact with the learner can do no better than providing random examples. We then prove, however, that with interaction, a teacher can efficiently find a set of teaching examples that is a provably good approximation to the optimal set. As an illustration, we show how this scheme can be used to shrink training sets for any family of classifiers: that is, to find an approximately-minimal subset of training instances that yields the same classifier as the entire set.

NeurIPS Conference 2018 Conference Paper

Interactive Structure Learning with Structural Query-by-Committee

  • Christopher Tosh
  • Sanjoy Dasgupta

In this work, we introduce interactive structure learning, a framework that unifies many different interactive learning tasks. We present a generalization of the query-by-committee active learning algorithm for this setting, and we study its consistency and rate of convergence, both theoretically and empirically, with and without noise.

NeurIPS Conference 2018 Conference Paper

Learning from discriminative feature feedback

  • Sanjoy Dasgupta
  • Akansha Dey
  • Nicholas Roberts
  • Sivan Sabato

We consider the problem of learning a multi-class classifier from labels as well as simple explanations that we call "discriminative features". We show that such explanations can be provided whenever the target concept is a decision tree, or more generally belongs to a particular subclass of DNF formulas. We present an efficient online algorithm for learning from such feedback and we give tight bounds on the number of mistakes made during the learning process. These bounds depend only on the size of the target concept and not on the overall number of available features, which could be infinite. We also demonstrate the learning procedure experimentally.

ICML Conference 2017 Conference Paper

Diameter-Based Active Learning

  • Christopher Tosh
  • Sanjoy Dasgupta

To date, the tightest upper and lower-bounds for the active learning of general concept classes have been in terms of a parameter of the learning problem called the splitting index. We provide, for the first time, an efficient algorithm that is able to realize this upper bound, and we empirically demonstrate its good performance.

STOC Conference 2016 Conference Paper

A cost function for similarity-based hierarchical clustering

  • Sanjoy Dasgupta

The development of algorithms for hierarchical clustering has been hampered by a shortage of precise objective functions. To help address this situation, we introduce a simple cost function on hierarchies over a set of points, given pairwise similarities between those points. We show that this criterion behaves sensibly in canonical instances and that it admits a top-down construction procedure with a provably good approximation ratio.

NeurIPS Conference 2016 Conference Paper

An algorithm for L1 nearest neighbor search via monotonic embedding

  • Xinan Wang
  • Sanjoy Dasgupta

Fast algorithms for nearest neighbor (NN) search have in large part focused on L2 distance. Here we develop an approach for L1 distance that begins with an explicit and exact embedding of the points into L2. We show how this embedding can efficiently be combined with random projection methods for L2 NN search, such as locality-sensitive hashing or random projection trees. We rigorously establish the correctness of the methodology and show by experimentation that it is competitive in practice with available alternatives.

ICML Conference 2016 Conference Paper

Interactive Bayesian Hierarchical Clustering

  • Sharad Vikram
  • Sanjoy Dasgupta

Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user’s needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering while still utilizing the geometry of the data by sampling a constrained posterior distribution over hierarchies. We also suggest several ways to intelligently query a user. The algorithm, along with the querying schemes, shows promising results on real data.

NeurIPS Conference 2014 Conference Paper

Incremental Clustering: The Case for Extra Clusters

  • Margareta Ackerman
  • Sanjoy Dasgupta

The explosion in the amount of data available for analysis often necessitates a transition from batch to incremental clustering methods, which process one element at a time and typically store only a small subset of the data. In this paper, we initiate the formal analysis of incremental clustering methods focusing on the types of cluster structure that they are able to detect. We find that the incremental setting is strictly weaker than the batch model, proving that a fundamental class of cluster structures that can readily be detected in the batch setting is impossible to identify using any incremental method. Furthermore, we show how the limitations of incremental clustering can be overcome by allowing additional clusters.

ICML Conference 2014 Conference Paper

Lower Bounds for the Gibbs Sampler over Mixtures of Gaussians

  • Christopher Tosh
  • Sanjoy Dasgupta

The mixing time of a Markov chain is the minimum time t necessary for the total variation distance between the distribution of the Markov chain’s current state X_t and its stationary distribution to fall below some ε> 0. In this paper, we present lower bounds for the mixing time of the Gibbs sampler over Gaussian mixture models with Dirichlet priors.

NeurIPS Conference 2014 Conference Paper

Optimal rates for k-NN density and mode estimation

  • Sanjoy Dasgupta
  • Samory Kpotufe

We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators -- based on $k$-NN -- which attain minimax-optimal rates under surprisingly general distributional conditions.

NeurIPS Conference 2014 Conference Paper

Rates of Convergence for Nearest Neighbor Classification

  • Kamalika Chaudhuri
  • Sanjoy Dasgupta

We analyze the behavior of nearest neighbor classification in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. These are more general than existing bounds, and enable us, as a by-product, to establish the universal consistency of nearest neighbor in a broader range of data spaces than was previously known. We illustrate our upper and lower bounds by introducing a new smoothness class customized for nearest neighbor classification. We find, for instance, that under the Tsybakov margin condition the convergence rate of nearest neighbor matches recently established lower bounds for nonparametric classification.

NeurIPS Conference 2013 Conference Paper

Moment-based Uniform Deviation Bounds for $k$-means and Friends

  • Matus Telgarsky
  • Sanjoy Dasgupta

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with $p\geq 4$ bounded moments; in particular, the difference between the sample cost and distribution cost decays with $m$ and $p$ as $m^{\min\{-1/4, -1/2+2/p\}}$. The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of $k$-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for $k$-means instances possessing some cluster structure.

NeurIPS Conference 2013 Conference Paper

The Fast Convergence of Incremental PCA

  • Akshay Balsubramani
  • Sanjoy Dasgupta
  • Yoav Freund

We prove the first finite-sample convergence rates for any incremental PCA algorithm using sub-quadratic time and memory per iteration. The algorithm analyzed is Oja's learning rule, an efficient and well-known scheme for estimating the top principal component. Our analysis of this non-convex problem yields expected and high-probability convergence rates of $\tilde{O}(1/n)$ through a novel technique. We relate our guarantees to existing rates for stochastic gradient descent on strongly convex functions, and extend those results. We also include experiments which demonstrate convergence behaviors predicted by our analysis.

NeurIPS Conference 2010 Conference Paper

Rates of convergence for the cluster tree

  • Kamalika Chaudhuri
  • Sanjoy Dasgupta

For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f. We present a procedure for estimating the cluster tree given samples from f. We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem.

JMLR Journal 2009 Journal Article

Analysis of Perceptron-Based Active Learning

  • Sanjoy Dasgupta
  • Adam Tauman Kalai
  • Claire Monteleoni

We start by showing that in an active learning setting, the Perceptron algorithm needs Ω(1/ε 2 ) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error ε after asking for just Õ ( d log 1/ε) labels. This exponential improvement over the usual sample complexity of supervised learning had previously been demonstrated only for the computationally more complex query-by-committee algorithm. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

UAI Conference 2009 Conference Paper

Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?

  • Nakul Verma
  • Samory Kpotufe
  • Sanjoy Dasgupta

Recent theory work has found that a special type of spatial partition tree – called a random projection tree – is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search.

NeurIPS Conference 2007 Conference Paper

A general agnostic active learning algorithm

  • Sanjoy Dasgupta
  • Daniel Hsu
  • Claire Monteleoni

We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previ- ous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using re- ductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fall-back guarantee that bounds the algorithm’s label complexity by the agnostic PAC sample complexity. Our analysis yields asymptotic label complexity improvements for certain hypothesis classes and distributions. We also demonstrate improvements experimentally.

NeurIPS Conference 2007 Conference Paper

A learning framework for nearest neighbor search

  • Lawrence Cayton
  • Sanjoy Dasgupta

Can we leverage learning techniques to build a fast nearest-neighbor (NN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures.

JMLR Journal 2007 Journal Article

A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians

  • Sanjoy Dasgupta
  • Leonard Schulman

We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ d, a simple two-round variant of EM will, with high probability, learn the parameters of the Gaussians to near-optimal precision, if the dimension is high ( d >> ln k ). We relate this to previous theoretical and empirical work on the EM algorithm. [abs] [ pdf ][ bib ] &copy JMLR 2007. ( edit, beta )

UAI Conference 2006 Conference Paper

A Concentration Theorem for Projections

  • Sanjoy Dasgupta
  • Daniel J. Hsu
  • Nakul Verma

X in R^D has mean zero and finite second moments. We show that there is a precise sense in which almost all linear projections of X into R^d (for d < D) look like a scale-mixture of spherical Gaussians -- specifically, a mixture of distributions N(0, sigma^2 I_d) where the weight of the particular sigma component is P (| X |^2 = sigma^2 D). The extent of this effect depends upon the ratio of d to D, and upon a particular coefficient of eccentricity of X's distribution. We explore this result in a variety of experiments.

NeurIPS Conference 2004 Conference Paper

Analysis of a greedy active learning strategy

  • Sanjoy Dasgupta

We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sam- ple complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels.

NeurIPS Conference 2003 Conference Paper

An Iterative Improvement Procedure for Hierarchical Clustering

  • David Kauchak
  • Sanjoy Dasgupta

We describe a procedure which finds a hierarchical clustering by hill- climbing. The cost function we use is a hierarchical extension of the k-means cost; our local moves are tree restructurings and node reorder- ings. We show these can be accomplished efficiently, by exploiting spe- cial properties of squared Euclidean distances and by using techniques from scheduling algorithms.

STOC Conference 2002 Conference Paper

The complexity of approximating entropy

  • Tugkan Batu
  • Sanjoy Dasgupta
  • Ravi Kumar 0001
  • Ronitt Rubinfeld

(MATH) We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i -th location is the probability of the i -th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a λ-multiplicative approximation to the entropy can be obtained in O ( n (1+η) /λ 2 < poly(log n ) ) time for distributions with entropy Ω(λ η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω( n max(1/(2λ 2 ), 2/(5λ 2 —2)) .We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a λ-multiplicative approximation to the entropy can be obtained in O ( λ2 .Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.

NeurIPS Conference 2001 Conference Paper

PAC Generalization Bounds for Co-training

  • Sanjoy Dasgupta
  • Michael Littman
  • David McAllester

The rule-based bootstrapping introduced by Yarowsky, and its co- training variant by Blum and Mitchell, have met with considerable em- pirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PAC-style bound on generalization error which justifies both the use of confidences — partial rules and partial labeling of the unlabeled data — and the use of an agreement-based objective function as sug- gested by Collins and Singer. Our bounds apply to the multiclass case, i. e. , where instances are to be assigned one of

UAI Conference 2000 Conference Paper

A Two-Round Variant of EM for Gaussian Mixtures

  • Sanjoy Dasgupta
  • Leonard J. Schulman

Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more ``focused' predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawids prequential(predictive sequential) principle.The results demonstrate that the marginal likelihood score does NOT perform well FOR supervised model selection, WHILE the best results are obtained BY using Dawids prequential r napproach.

UAI Conference 2000 Conference Paper

Experiments with Random Projection

  • Sanjoy Dasgupta

Recent theoretical work has identified random projection as a promising dimensionality reduction technique for learning mixtures of Gausians. Here we summarize these results and illustrate them by a wide variety of experiments on synthetic and real data.

FOCS Conference 1999 Conference Paper

Learning Mixtures of Gaussians

  • Sanjoy Dasgupta

Mixtures of Gaussians are among the most fundamental and widely used statistical models. Current techniques for learning such mixtures from data are local search heuristics with weak performance guarantees. We present the first provably correct algorithm for learning a mixture of Gaussians. This algorithm is very simple and returns the true centers of the Gaussians to within the precision specified by the user with high probability. It runs in time only linear in the dimension of the data and polynomial in the number of Gaussians.

UAI Conference 1999 Conference Paper

Learning Polytrees

  • Sanjoy Dasgupta

We consider the task of learning the maximum-likelihood polytree from data. Our first result is a performance guarantee establishing that the optimal branching (or Chow-Liu tree), which can be computed very easily, constitutes a good approximation to the best polytree. We then show that it is not possible to do very much better, since the learning problem is NP-hard even to approximately solve within some constant factor.