Arrow Research search

Author name cluster

Erik Vee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICLR Conference 2023 Conference Paper

Robust Active Distillation

  • Cenk Baykal
  • Khoa Trinh
  • Fotis Iliopoulos
  • Gaurav Menghani
  • Erik Vee

Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the teacher additionally constrains the number of soft-labels that can be queried due to prohibitive computational and/or financial costs. The difficulty in achieving simultaneous \emph{efficiency} (i.e., minimizing soft-label queries) and \emph{robustness} (i.e., avoiding student inaccuracies due to incorrect labels) hurts the widespread application of knowledge distillation to many modern tasks. In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. At the core of our work lies a game-theoretic formulation that explicitly considers the inherent trade-off between the informativeness and correctness of input instances. We establish bounds on the expected performance of our approach that hold even in worst-case distillation instances. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods.

NeurIPS Conference 2023 Conference Paper

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

  • Vasilis Kontonis
  • Fotis Iliopoulos
  • Khoa Trinh
  • Cenk Baykal
  • Gaurav Menghani
  • Erik Vee

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates "soft" pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment" methods.

SODA Conference 2022 Conference Paper

Learning-Augmented Weighted Paging

  • Nikhil Bansal 0001
  • Christian Coester
  • Ravi Kumar 0001
  • Manish Purohit
  • Erik Vee

We consider a natural semi-online model for weighted paging, where at any time the algorithm is given predictions, possibly with errors, about the next arrival of each page. The model is inspired by Belady's classic optimal offline algorithm for unweighted paging, and extends the recently studied model for learning-augmented paging [45, 50, 52] to the weighted setting. For the case of perfect predictions, we provide an ℓ -competitive deterministic and an O (log ℓ )-competitive randomized algorithm, where ℓ is the number of distinct weight classes. Both these bounds are tight, and imply an O (log W )- and O (log log W )-competitive ratio, respectively, when the page weights lie between 1 and W. Previously, it was not known how to use these predictions in the weighted setting and only bounds of k and O (log k ) were known, where k is the cache size. Our results also generalize to the interleaved paging setting and to the case of imperfect predictions, with the competitive ratios degrading smoothly from O ( ℓ ) and O (log ℓ ) to O ( k ) and O (log k ), respectively, as the prediction error increases. Our results are based on several insights on structural properties of Belady's algorithm and the sequence of page arrival predictions, and novel potential functions that incorporate these predictions. For the case of unweighted paging, the results imply a very simple potential function based proof of the optimality of Belady's algorithm, which may be of independent interest.

NeurIPS Conference 2022 Conference Paper

Weighted Distillation with Unlabeled Examples

  • Fotis Iliopoulos
  • Vasilis Kontonis
  • Cenk Baykal
  • Gaurav Menghani
  • Khoa Trinh
  • Erik Vee

Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large “teacher” neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smaller “student” model which will actually be deployed. Naturally, the success of the approach depends on the quality of the teacher’s labels, since the student could be confused if trained on inaccurate data. This paper proposes a principled approach for addressing this issue based on a “debiasing" reweighting of the student’s loss function tailored to the distillation training paradigm. Our method is hyper-parameter free, data-agnostic, and simple to implement. We demonstrate significant improvements on popular academic datasets and we accompany our results with a theoretical analysis which rigorously justifies the performance of our method in certain settings.

NeurIPS Conference 2019 Conference Paper

Efficient Rematerialization for Deep Networks

  • Ravi Kumar
  • Manish Purohit
  • Zoya Svitkina
  • Erik Vee
  • Joshua Wang

When training complex neural networks, memory usage can be an important bottleneck. The question of when to rematerialize, i. e. , to recompute intermediate values rather than retaining them in memory, becomes critical to achieving the best time and space efficiency. In this work we consider the rematerialization problem and devise efficient algorithms that use structural characterizations of computation graphs---treewidth and pathwidth---to obtain provably efficient rematerialization schedules. Our experiments demonstrate the performance of these algorithms on many common deep learning models.

FOCS Conference 2006 Conference Paper

Towards Secure and Scalable Computation in Peer-to-Peer Networks

  • Valerie King
  • Jared Saia
  • Vishal Sanwalani
  • Erik Vee

We consider the problems of Byzantine agreement and leader election, where a constant fraction b < 1/3 of processors are controlled by a malicious adversary. The first problem requires that all uncorrupted processors come to an agreement on a bit initially held by one of the uncorrupted processors; the second requires that the uncorrupted processors choose a leader who is uncorrupted. Motivated by the need for robust and scalable computation in peer-to-peer networks, we design the first scalable protocols for these problems for a network whose degree is polylogarithmic in its size. By scalable, we mean that each uncorrupted processor sends and processes a number of bits that is only polylogarithmic in n. (We assume no limit on the number of messages sent by corrupted processors.) With high probability, our Byzantine agreement protocol results in agreement among a 1 - O(1/ln n) fraction of the uncorrupted processors. With constant probability, our leader election protocol elects an uncorrupted leader and ensures that a 1 - O(1/ln n) fraction of the uncorrupt processors know this leader. We assume a full information model. Thus, the adversary is assumed to have unlimited computational power and has access to all communications, but does not have access to processors' private random bits

FOCS Conference 2000 Conference Paper

Super-linear time-space tradeoff lower bounds for randomized computation

  • Paul Beame
  • Michael E. Saks
  • Xiaodong Sun
  • Erik Vee

We prove the first time-space lower bound tradeoffs for randomized computation of decision problems. The bounds hold even in the case that the computation is allowed to have arbitrary probability of error on a small fraction of inputs. Our techniques are an extension of those used by M. Ajtai (1999) in his time-space tradeoffs for deterministic RAM algorithms computing element distinctness and for deterministic Boolean branching programs computing an explicit function based on quadratic forms over GF(2). Our results also give a quantitative improvement over those given by Ajtai. Ajtai shows, for certain specific functions, that any branching program using space S=o(n) requires time T that is superlinear. The functional form of the superlinear bound is not given in his paper, but optimizing the parameters in his arguments gives T= /spl Omega/(n log log n/log log log n) for S=0(n/sup 1-/spl epsiv//). For the same functions considered by Ajtai, we prove a time-space tradeoff of the form T=/spl Omega/(n/spl radic/(log(n/S)/log log(n/S))). In particular for space 0(n/sup 1-/spl epsiv//), this improves the lower bound on time to /spl Omega/(n/spl radic/(log n/log log n)).