Arrow Research search

Author name cluster

Grigory Yaroslavtsev

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

AAAI Conference 2024 Conference Paper

Approximation Scheme for Weighted Metric Clustering via Sherali-Adams

  • Dmitrii Avdiukhin
  • Vaggos Chatziafratis
  • Konstantin Makarychev
  • Grigory Yaroslavtsev

Motivated by applications to classification problems on metric data, we study Weighted Metric Clustering problem: given a metric d over n points and a k x k symmetric matrix A with non-negative entries, the goal is to find a k-partition of these points into clusters C1,...,Ck, while minimizing the sum of A[i,j] * d(u,v) over all pairs of clusters Ci and Cj and all pairs of points u from Ci and v from Cj. Specific choices of A lead to Weighted Metric Clustering capturing well-studied graph partitioning problems in metric spaces, such as Min-Uncut, Min-k-Sum, Min-k-Cut, and more. Our main result is that Weighted Metric Clustering admits a polynomial-time approximation scheme (PTAS). Our algorithm handles all the above problems using the Sherali-Adams linear programming relaxation. This subsumes several prior works, unifies many of the techniques for various metric clustering objectives, and yields a PTAS for several new problems, including metric clustering on manifolds and a new family of hierarchical clustering objectives. Our experiments on the hierarchical clustering objective show that it better captures the ground-truth structural information compared to the popular Dasgupta's objective.

NeurIPS Conference 2024 Conference Paper

Embedding Dimension of Contrastive Learning and $k$-Nearest Neighbors

  • Dmitrii Avdiukhin
  • Vaggos Chatziafratis
  • Orr Fischer
  • Grigory Yaroslavtsev

We study the embedding dimension of distance comparison data in two settings: contrastive learning and $k$-nearest neighbors ($k$-NN). In both cases, the goal is to find the smallest dimension $d$ of an $\ell_p$-space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used $\ell_2$-distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given $m$ labeled samples of the form $(x_i, y_i^+, z_i^-)$ representing the fact that the positive example $y_i$ is closer to the anchor $x_i$ than the negative example $z_i$. We show that for representing such dataset in: - $\ell_2$: $d = \Theta(\sqrt{m})$ is necessary and sufficient. - $\ell_p$ for $p \ge 1$: $d = O(m)$ is sufficient and $d = \tilde \Omega(\sqrt{m})$ is necessary. - $\ell_\infty$: $d = O(m^{2/3})$ is sufficient and $d = \tilde \Omega(\sqrt{m})$ is necessary. We also give results for the more general scenario when $t$ negatives are allowed. In $k$-NN, for each of the $n$ data points we are given an ordered set of the closest $k$ points. We show that for preserving the ordering of the $k$-NN for every point in: - $\ell_2$: $d = \Theta(k)$ is necessary and sufficient. - $\ell_p$ for $p \ge 1$: $d = \tilde O(k^2)$ is sufficient and $d=\tilde \Omega(k)$ is necessary. - $\ell_\infty$: $d = \tilde \Omega(k)$ is necessary. Furthermore, if the goal is to not just preserve the ordering of the $k$-NN but also keep them as the nearest neighbors then $d = \tilde O (\mathrm{poly}(k))$ suffices in $\ell_p$ for $p \ge 1$.

ICLR Conference 2024 Conference Paper

Optimal Sample Complexity of Contrastive Learning

  • Noga Alon
  • Dmitrii Avdiukhin
  • Dor Elboim
  • Orr Fischer
  • Grigory Yaroslavtsev

Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$, we show that $\tilde \Theta(nd)$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

IJCAI Conference 2023 Conference Paper

HOUDINI: Escaping from Moderately Constrained Saddles

  • Dmitrii Avdiukhin
  • Grigory Yaroslavtsev

We give polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function, we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. While analogous results exist for unconstrained and equality-constrained problems, we make progress on the major open question of convergence to second-order stationary points in the case of inequality constraints, without reliance on NP-oracles or altering the definitions to only account for certain constraints. Our results hold for both regular and stochastic gradient descent.

AAAI Conference 2023 Conference Paper

Tree Learning: Optimal Sample Complexity and Algorithms

  • Dmitrii Avdiukhin
  • Grigory Yaroslavtsev
  • Danny Vainstein
  • Orr Fischer
  • Sauman Das
  • Faraz Mirza

We study the problem of learning a hierarchical tree representation of data from labeled samples, taken from an arbitrary (and possibly adversarial) distribution. Consider a collection of data tuples labeled according to their hierarchical structure. The smallest number of such tuples required in order to be able to accurately label subsequent tuples is of interest for data collection in machine learning. We present optimal sample complexity bounds for this problem in several learning settings, including (agnostic) PAC learning and online learning. Our results are based on tight bounds of the Natarajan and Littlestone dimensions of the associated problem. The corresponding tree classifiers can be constructed efficiently in near-linear time.

NeurIPS Conference 2021 Conference Paper

Escaping Saddle Points with Compressed SGD

  • Dmitrii Avdiukhin
  • Grigory Yaroslavtsev

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an $\varepsilon$-first-order stationary point. In this paper we extend these results to convergence to an $\varepsilon$-second-order stationary point ($\varepsilon$-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RandomK compressor converges to an $\varepsilon$-SOSP with the same number of iterations as uncompressed SGD [Jin et al. ,2021] (JACM), while improving the total communication by a factor of $\tilde \Theta(\sqrt{d} \varepsilon^{-3/4})$, where $d$ is the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.

AAAI Conference 2021 Conference Paper

Objective-Based Hierarchical Clustering of Deep Embedding Vectors

  • Stanislav Naumov
  • Grigory Yaroslavtsev
  • Dmitrii Avdiukhin

We initiate a comprehensive experimental study of objectivebased hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e. g. ResNet, ResNext, Inception V3, SBERT). Our study includes datasets with up to 4. 5 million entries with embedding dimensions up to 2048. In order to address the challenge of scaling up hierarchical clustering to such large datasets we propose a new practical hierarchical clustering algorithm B++&C. It gives a 5%/20% improvement on average for the popular Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared to a wide range of classic methods and recent heuristics. We also introduce a theoretical algorithm B2SAT&C which achieves a 0. 74-approximation for the CKMM objective in polynomial time. This is the first substantial improvement over the trivial 2/3-approximation achieved by a random binary tree. Prior to this work, the best poly-time approximation of ≈ 2/3 + 0. 0004 was due to Charikar et al. (SODA’19).

ICML Conference 2018 Conference Paper

Massively Parallel Algorithms and Hardness for Single-Linkage Clustering under 𝓁 p Distances

  • Grigory Yaroslavtsev
  • Adithya Vadapalli

We present first massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of n input d-dimensional vectors under Hamming, $\ell_1, \ell_2$ and $\ell_\infty$ distances. All our algorithms run in O(log n) rounds of MPC for any fixed d and achieve (1+\epsilon)-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for o(\log n)-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on the largest available vector datasets from the UCI machine learning repository exhibiting speedups of several orders of magnitude.

SODA Conference 2016 Conference Paper

Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model

  • Sepehr Assadi
  • Sanjeev Khanna
  • Yang Li 0025
  • Grigory Yaroslavtsev

We study the problem of finding an approximate maximum matching in two closely related computational models, namely, the dynamic graph streaming model and the simultaneous multi-party communication model. In the dynamic graph streaming model, the input graph is revealed as a stream of edge insertions and deletions, and the goal is to design a small space algorithm to approximate the maximum matching. In the simultaneous model, the input graph is partitioned across k players, and the goal is to design a protocol where the k players simultaneously send a small-size message to a coordinator, and the coordinator computes an approximate matching. Dynamic graph streams. We resolve the space complexity of single-pass turnstile streaming algorithms for approximating matchings by showing that for any ∊ > 0, ⊝( n 2–3 e ) space is both sufficient and necessary (up to polylogarithmic factors) to compute an n ∊-approximate matching; here n denotes the number of vertices in the input graph. The simultaneous communication model. Our results for dynamic graph streams also resolve the (per-player) simultaneous communication complexity for approximating matchings in the edge partition model. For the vertex partition model, we design new randomized and deterministic protocols for k players to achieve an α -approximation. Specifically, for, we provide a randomized protocol with total communication of O ( nk / α 2 ) and a deterministic protocol with total communication of O ( nk / α ). Both these bounds are tight. Our work generalizes the results established by Dobzinski et al. (STOC 2014) for the special case of k = n. Finally, for the case of, we establish a new lower bound on the simultaneous communication complexity which is super-linear in n.

SODA Conference 2013 Conference Paper

Learning pseudo-Boolean k -DNF and submodular functions

  • Sofya Raskhodnikova
  • Grigory Yaroslavtsev

We prove that any submodular function f: {0, 1} n → {0, 1, …, k } can be represented as a pseudo-Boolean 2 k -DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Håstad's switching lemma holds for pseudo-Boolean k -DNFs if all constants associated with the terms of the formula are bounded. This allows us to generalize Mansour's PAC-learning algorithm for k -DNFs to pseudo-Boolean k -DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f: {0, 1} n → {0, 1, …, k }. Our algorithm runs in time polynomial in n, k O ( k log k /ε) and log(1/δ) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC ′11), Gupta, Hardt, Roth, Ullman (STOC ′11), Cheraghchi, Klivans, Kothari, Lee (SODA ′12)] implies only n O ( k ) query complexity for learning submodular functions in this setting, for fixed ε and δ. Our learning algorithm implies a property tester for submodularity of functions f: {0, 1} n → {0, …, k } with query complexity polynomial in n for k = O ((log n /log log n ) 1/2 ) and constant proximity parameter ε.

SAT Conference 2009 Conference Paper

Finding Efficient Circuits Using SAT-Solvers

  • Arist Kojevnikov
  • Alexander S. Kulikov
  • Grigory Yaroslavtsev

Abstract In this paper we report preliminary results of experiments with finding efficient circuits (over binary bases) using SAT-solvers. We present upper bounds for functions with constant number of inputs as well as general upper bounds that were found automatically. We focus mainly on MOD-functions. Besides theoretical interest, these functions are also interesting from a practical point of view as they are the core of the residue number system. In particular, we present a circuit of size 3 n + c over the full binary basis computing \({\rm MOD}_3^n\).