Author name cluster

James Sharpnack

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

ICML Conference 2023 Conference Paper

RLSbench: Domain Adaptation Under Relaxed Label Shift

Saurabh Garg
Nick Erickson
James Sharpnack
Alexander J. Smola
Sivaraman Balakrishnan
Zachary C. Lipton

Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies in evaluation standards, datasets, and baselines make it difficult to gauge the current best practices. In this paper, we introduce RLSbench, a large-scale benchmark for relaxed label shift, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. Unlike existing benchmarks, which primarily focus on shifts in class-conditional $p(x|y)$, our benchmark also focuses on label marginal shifts. First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with target label distribution estimate. The meta-algorithm improves existing domain adaptation heuristics under large label proportion shifts, often by 2–10% accuracy points, while conferring minimal effect ($<$0. 5%) when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https: //github. com/acmi-lab/RLSbench.

Details

NeurIPS Conference 2022 Conference Paper

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

QIN DING
Yue Kang
Yi-Wei Liu
Thomas Chun Man Lee
Cho-Jui Hsieh
James Sharpnack

The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We derive the regret bounds of our proposed Syndicated Bandits framework and show it can avoid its regret dependent exponentially in the number of hyper-parameters to be tuned. Moreover, it achieves optimal regret bounds under certain scenarios. Syndicated Bandits framework is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework.

PDF Details

NeurIPS Conference 2019 Conference Paper

Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

Liwei Wu
Shuqing Li
Cho-Jui Hsieh
James Sharpnack

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

PDF Details

ICML Conference 2018 Conference Paper

SQL-Rank: A Listwise Approach to Collaborative Ranking

Liwei Wu 0001
Cho-Jui Hsieh
James Sharpnack

In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of ListNet (Cao et al. , 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.

Details

JMLR Journal 2018 Journal Article

The DFS Fused Lasso: Linear-Time Denoising over General Graphs

OSCAR HERNAN MADRID PADILLA
James Sharpnack
James G. Scott
Ryan J. Tibshirani

The fused lasso, also known as (anisotropic) total variation denoising, is widely used for piecewise constant signal estimation with respect to a given undirected graph. The fused lasso estimate is highly nontrivial to compute when the underlying graph is large and has an arbitrary structure. But for a special graph structure, namely, the chain graph, the fused lasso---or simply, 1d fused lasso---can be computed in linear time. In this paper, we revisit a result recently established in the online classification literature (Herbster et al., 2009; Cesa-Bianchi et al., 2013) and show that it has important implications for signal denoising on graphs. The result can be translated to our setting as follows. Given a general graph, if we run the standard depth-first search (DFS) traversal algorithm, then the total variation of any signal over the chain graph induced by DFS is no more than twice its total variation over the original graph. This result leads to several interesting theoretical and computational conclusions. Letting $m$ and $n$ denote the number of edges and nodes, respectively, of the graph in consideration, it implies that for an underlying signal with total variation $t$ over the graph, the fused lasso (properly tuned) achieves a mean squared error rate of $t^{2/3} n^{-2/3}$. Moreover, precisely the same mean squared error rate is achieved by running the 1d fused lasso on the DFS-induced chain graph. Importantly, the latter estimator is simple and computationally cheap, requiring $O(m)$ operations to construct the DFS-induced chain and $O(n)$ operations to compute the 1d fused lasso solution over this chain. Further, for trees that have bounded maximum degree, the error rate of $t^{2/3} n^{-2/3}$ cannot be improved, in the sense that it is the minimax rate for signals that have total variation $t$ over the tree. Finally, several related results also hold---for example, the analogous result holds for a roughness measure defined by the $\ell_0$ norm of differences across edges in place of the total variation metric. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

NeurIPS Conference 2017 Conference Paper

A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening

Kevin Lin
James Sharpnack
Alessandro Rinaldo
Ryan Tibshirani

In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints. This rate is seen to be suboptimal (compared to the minimax rate) by only a factor of $\log\log{n}$. Our proof technique is centered around a novel construction that we call a lower interpolant. We extend our results to misspecified models and exponential family distributions. We also describe the implications of our error analysis for the approximate screening of changepoints.

PDF Details

NeurIPS Conference 2017 Conference Paper

Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods

Veeranjaneyulu Sadhanala
Yu-Xiang Wang
James Sharpnack
Ryan Tibshirani

We consider the problem of estimating the values of a function over $n$ nodes of a $d$-dimensional grid graph (having equal side lengths $n^{1/d}$) from noisy observations. The function is assumed to be smooth, but is allowed to exhibit different amounts of smoothness at different regions in the grid. Such heterogeneity eludes classical measures of smoothness from nonparametric statistics, such as Holder smoothness. Meanwhile, total variation (TV) smoothness classes allow for heterogeneity, but are restrictive in another sense: only constant functions count as perfectly smooth (achieve zero TV). To move past this, we define two new higher-order TV classes, based on two ways of compiling the discrete derivatives of a parameter across the nodes. We relate these two new classes to Holder classes, and derive lower bounds on their minimax errors. We also analyze two naturally associated trend filtering methods; when $d=2$, each is seen to be rate optimal over the appropriate class.

PDF Details

JMLR Journal 2016 Journal Article

Trend Filtering on Graphs

Yu-Xiang Wang
James Sharpnack
Alexander J. Smola
Ryan J. Tibshirani

We introduce a family of adaptive estimators on graphs, based on penalizing the $\ell_1$ norm of discrete graph differences. This generalizes the idea of trend filtering (Kim et al., 2009; Tibshirani, 2014), used for univariate nonparametric regression, to graphs. Analogous to the univariate case, graph trend filtering exhibits a level of local adaptivity unmatched by the usual $\ell_2$-based graph smoothers. It is also defined by a convex minimization problem that is readily solved (e.g., by fast ADMM or Newton algorithms). We demonstrate the merits of graph trend filtering through both examples and theory. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

NeurIPS Conference 2013 Conference Paper

Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

James Sharpnack
Akshay Krishnamurthy
Aarti Singh

The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov\'asz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, $k$-nearest neighbor graphs, and $\epsilon$-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.

PDF Details

ICML Conference 2012 Conference Paper

Variance Function Estimation in High-dimensions

Mladen Kolar
James Sharpnack

Details

NeurIPS Conference 2010 Conference Paper

Identifying graph-structured activation patterns in networks

James Sharpnack
Aarti Singh

We consider the problem of identifying an activation pattern in a complex, large-scale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of high-dimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size.

PDF Details