Author name cluster

Chen Dan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

JMLR Journal 2022 Journal Article

Fundamental Limits and Tradeoffs in Invariant Representation Learning

Han Zhao
Chen Dan
Bryon Aragam
Tommi S. Jaakkola
Geoffrey J. Gordon
Pradeep Ravikumar

A wide range of machine learning applications such as privacy-preserving learning, algorithmic fairness, and domain adaptation/generalization among others, involve learning invariant representations of the data that aim to achieve two competing goals: (a) maximize information or accuracy with respect to a target response, and (b) maximize invariance or independence with respect to a set of protected features (e.g.\ for fairness, privacy, etc). Despite their wide applicability, theoretical understanding of the optimal tradeoffs --- with respect to accuracy, and invariance --- achievable by invariant representations is still severely lacking. In this paper, we provide an information theoretic analysis of such tradeoffs under both classification and regression settings. More precisely, we provide a geometric characterization of the accuracy and invariance achievable by any representation of the data; we term this feasible region the information plane. We provide an inner bound for this feasible region for the classification case, and an exact characterization for the regression case, which allows us to either bound or exactly characterize the Pareto optimal frontier between accuracy and invariance. Although our contributions are mainly theoretical, a key practical application of our results is in certifying the potential sub-optimality of any given representation learning algorithm for either classification or regression tasks. Our results shed new light on the fundamental interplay between accuracy and invariance, and may be useful in guiding the design of future representation learning algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

PDF Details

NeurIPS Conference 2021 Conference Paper

Boosted CVaR Classification

Runtian Zhai
Chen Dan
Arun Suggala
J. Zico Kolter
Pradeep Ravikumar

Many modern machine learning tasks require models with high tail performance, i. e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model's tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which computes the average risk over the tails of the loss. However, for classification tasks where models are evaluated by the zero-one loss, we show that if the classifiers are deterministic, then the minimizer of the average zero-one loss also minimizes the CVaR zero-one loss, suggesting that CVaR loss minimization is not helpful without additional assumptions. We circumvent this negative result by minimizing the CVaR loss over randomized classifiers, for which the minimizers of the average zero-one loss and the CVaR zero-one loss are no longer the same, so minimizing the latter can lead to better tail performance. To learn such randomized classifiers, we propose the Boosted CVaR Classification framework which is motivated by a direct relationship between CVaR and a classical boosting algorithm called LPBoost. Based on this framework, we design an algorithm called $\alpha$-AdaLPBoost. We empirically evaluate our proposed algorithm on four benchmark datasets and show that it achieves higher tail performance than deterministic model training methods.

PDF Details

NeurIPS Conference 2019 Conference Paper

Optimal Analysis of Subset-Selection Based L_p Low-Rank Approximation

Chen Dan
Hong Wang
Hongyang Zhang
Yuchen Zhou
Pradeep Ravikumar

We show that for the problem of $\ell_p$ rank-$k$ approximation of any given matrix over $R^{n\times m}$ and $C^{n\times m}$, the algorithm of column subset selection enjoys approximation ratio $(k+1)^{1/p}$ for $1\le p\le 2$ and $(k+1)^{1-1/p}$ for $p\ge 2$. This improves upon the previous $O(k+1)$ bound (Chierichetti et al. ,2017) for $p\ge 1$. We complement our analysis with lower bounds; these bounds match our upper bounds up to constant 1 when $p\geq 2$. At the core of our techniques is an application of Riesz-Thorin interpolation theorem from harmonic analysis, which might be of independent interest to other algorithmic designs and analysis more broadly. Our analysis results in improvements on approximation guarantees of several other algorithms with various time complexity. For example, to make the algorithm of column subset selection computationally efficient, we analyze a polynomial time bi-criteria algorithm which selects $O(k\log m)$ number of columns. We show that this algorithm has an approximation ratio of $O((k+1)^{1/p})$ for $1\le p\le 2$ and $O((k+1)^{1-1/p})$ for $p\ge 2$. This improves over the bound in (Chierichetti et al. ,2017) with an $O(k+1)$ approximation ratio. Our bi-criteria algorithm also implies an exact-rank method in polynomial time with a slightly larger approximation ratio.

PDF Details

NeurIPS Conference 2018 Conference Paper

The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models

Chen Dan
Liu Leqi
Bryon Aragam
Pradeep Ravikumar
Eric Xing

We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an $\Omega(K\log K)$ labeled sample complexity bound without imposing parametric assumptions, where $K$ is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ($K>2$), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier. Finally, we describe three algorithms for computing these estimators based on a connection to bipartite graph matching, and perform experiments to illustrate the superiority of the MLE over the majority vote estimator.

PDF Details