Author name cluster

Sanjoy Dasgupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Consistency of the $k_n$-nearest neighbor rule under adaptive sampling

Robi Bhattacharjee
Geelon So
Sanjoy Dasgupta

In the adaptive sampling model of online learning, future prediction tasks can be arbitrarily dependent on the past. Every round, an adversary selects an instance to test the learner. After the learner makes a prediction, a noisy label is drawn from an underlying conditional label distribution and is revealed to both learner and adversary. A learner is consistent if it eventually performs no worse than the Bayes predictor. We study the $k_n$-nearest neighbor learner within this setting. In the worst-case, the learner will fail because an adaptive process can generate spurious patterns out of noise. However, under the mild smoothing assumption that the process generating the instances is uniformly absolutely continuous and that choice of $(k_n)_n$ is reasonable, the $k_n$-nearest neighbor rule is online consistent.

PDF Details

NeurIPS Conference 2025 Conference Paper

Low Precision Streaming PCA

Sanjoy Dasgupta
Syamantak Kumar
Shourya Pandey
Purnamrita Sarkar

Low-precision Streaming PCA estimates the top principal component in a streaming setting under limited precision. We establish an information‐theoretic lower bound on the quantization resolution required to achieve a target accuracy for the leading eigenvector. We study Oja's algorithm for streaming PCA under linear and nonlinear stochastic quantization. The quantized variants use unbiased stochastic quantization of the weight vector and the updates. Under mild moment and spectral-gap assumptions on the data distribution, we show that a batched version achieves the lower bound up to logarithmic factors under both schemes. This leads to a nearly dimension-free quantization error in the nonlinear quantization setting. Empirical evaluations on synthetic streams validate our theoretical findings and demonstrate that our low-precision methods closely track the performance of standard Oja’s algorithm.

PDF Details

UAI Conference 2024 Conference Paper

Convergence Behavior of an Adversarial Weak Supervision Method

Steven An 0002
Sanjoy Dasgupta

Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results.