Author name cluster

Euiwoong Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers

2 author rows

STOC Conference 2025 Conference Paper

A (2+ε)-Approximation Algorithm for Metric k-Median

Vincent Cohen-Addad
Fabrizio Grandoni 0001
Euiwoong Lee
Chris Schwiegelshohn
Ola Svensson

Details

FOCS Conference 2025 Conference Paper

An Improved Greedy Approximation for (Metric) k-Means

Moses Charikar
Vincent Cohen-Addad
Ruiquan Gao 0001
Fabrizio Grandoni 0001
Euiwoong Lee
Ernest van Wijland

Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the k Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into k clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2 \sqrt{2}+\varepsilon{\lt}5. 83$-approximation algorithm for k-Means in general metrics. This substantially improves on the current-best $(9+\varepsilon)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS’17, SICOMP’20], and even slightly improves on the 5. 92-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC’22] for the Euclidean special case. A natural approach for k-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for k-Means build upon an adaptation of an LMP 3-approximation for facility location with metric connection costs in [Jain, Vazirani J. ACM’01] based on a primal-dual method, rather than on the improved LMP greedy 2-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J. ACM’03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC’25] for the related (metric) k Median problem.

Details

STOC Conference 2025 Conference Paper

Asymptotically Optimal Hardness for k-Set Packing and k-Matroid Intersection

Euiwoong Lee
Ola Svensson
Theophile Thiery

Details

NeurIPS Conference 2025 Conference Paper

Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering

Chenglin Fan
Dahoon Lee
Euiwoong Lee

Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been well studied, many real-world applications involve more nuanced relationships—either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been proposed: Chromatic Correlation Clustering (CCC), which assigns semantic colors to edge labels, and pseudometric-weighted CC, which allows edge weights satisfying the triangle inequality. In this paper, we develop improved approximation algorithms for both settings. Our approach leverages LP-based pivoting techniques combined with problem-specific rounding functions. For the pseudometric-weighted correlation clustering problem, we present a tight $\frac{10}{3}$-approximation algorithm, matching the best possible bound achievable within the framework of standard LP relaxation combined with specialized rounding. For the Chromatic Correlation Clustering (CCC) problem, we improve the approximation ratio from the previous best of $2. 5$ to $2. 15$, and we establish a lower bound of $2. 11$ within the same analytical framework, highlighting the near-optimality of our result.

PDF Details

FOCS Conference 2025 Conference Paper

Inapproximability of Finding Sparse Vectors in Codes, Subspaces, and Lattices

Vijay Bhattiprolu
Venkatesan Guruswami
Euiwoong Lee
Xuandi Ren

Finding sparse vectors is a fundamental problem that arises in several contexts including codes, subspaces, and lattices. In this work, we prove strong inapproximability results for all these variants using a novel approach that even bypasses the PCP theorem. Our main result is that it is NP-hard (under randomized reductions) to approximate the sparsest vector in a real subspace within any constant factor; the gap can be further amplified using tensoring. Our reduction has the property that there is a Boolean solution in the completeness case. As a corollary, this immediately recovers the state-of-the-art inapproximability factors for the shortest vector problem (SVP) on lattices. Our proof extends the range of $\mathbf{l}_{\_} \mathbf{p}$ (quasi) norms for which hardness was previously known, from ‘p at least one’ to ‘p at least zero’, answering a question raised by (Khot, JACM 2005). Previous hardness results for SVP, and the related minimum distance problem (MDP) for error-correcting codes, all use lattice/coding gadgets that have an abundance of codewords in a ball of radius smaller than the minimum distance. In contrast, our reduction only needs many codewords in a ball of radius slightly larger than the minimum distance. This enables an easy derandomization of our reduction for finite fields, giving a new elementary proof of deterministic hardness for MDP. We believe this weaker density requirement might offer a promising approach to showing deterministic hardness of SVP, a long elusive goal. The key technical ingredient underlying our result for real subspaces is a proof that in the kernel of a random Rademacher matrix, the support of any two linearly independent vectors have very little overlap. A broader motivation behind this work is the development of inapproximability techniques for problems over the reals. Analytic variants of sparsest vector have connections to small set expansion, quantum separability and polynomial maximization over convex sets, all of which appear to be out of reach of current PCP techniques. We hope that the approach we develop could enable progress on some of these problems.

Details

SODA Conference 2025 Conference Paper

Min-CSPs on Complete Instances

Aditya Anand 0001
Euiwoong Lee
Amatya Sharma

Details

STOC Conference 2025 Conference Paper

Solving the Correlation Cluster LP in Sublinear Time

Nairen Cao
Vincent Cohen-Addad
Euiwoong Lee
Shi Li 0001
David Rasmussen Lolck
Alantha Newman
Mikkel Thorup
Lukas Vogl

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. Cao, Cohen-Addad, Lee, Li, Newman, and Vogl [STOC 2024] introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, they showed how to find a feasible solution for the cluster LP in time O ( n poly(1/ε) ) with objective value at most (1+ε) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a (1.437+ε)-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most (1+ε) of the optimum in time O (2 poly(1/ε) n ), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+ε)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Details

SODA Conference 2025 Conference Paper

Unbreakable Decomposition in Close-to-Linear Time

Aditya Anand 0001
Euiwoong Lee
Jason Li 0006
Yaowei Long
Thatchaphol Saranurak

Details

SODA Conference 2024 Conference Paper

A PTAS for ℓ 0 -Low Rank Approximation: Solving Dense CSPs over Reals

Vincent Cohen-Addad
Chenglin Fan
Suprovat Ghoshal
Euiwoong Lee
Arnaud de Mesmay
Alantha Newman
Tony Chang Wang

Details

STOC Conference 2024 Conference Paper

Approximating Small Sparse Cuts

Aditya Anand 0001
Euiwoong Lee
Jason Li 0006
Thatchaphol Saranurak

We study polynomial-time approximation algorithms for edge and vertex Sparsest Cut and Small Set Expansion in terms of k , the number of edges or vertices cut in the optimal solution. Our main results are O (polylog k )-approximation algorithms for various versions in this setting. Our techniques involve an extension of the notion of sample sets (Feige and Mahdian STOC’06), originally developed for small balanced cuts, to sparse cuts in general. We then show how to combine this notion of sample sets with two algorithms, one based on an existing framework of LP rounding and another new algorithm based on the cut-matching game, to get such approximation algorithms. Our cut-matching game algorithm can be viewed as a local version of the cut-matching game by Khandekar, Khot, Orecchia and Vishnoi and certifies an expansion of every vertex set of size s in O (log s ) rounds. These techniques may be of independent interest. As corollaries of our results, we also obtain an O (log opt ) approximation for min-max graph partitioning, where opt is the min-max value of the optimal cut, and improve the bound on the size of multicut mimicking networks computable in polynomial time.

Details

NeurIPS Conference 2024 Conference Paper

Clustering with Non-adaptive Subset Queries

Hadley Black
Euiwoong Lee
Arya Mazumdar
Barna Saha

Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, $|S|=2$, the oracle returns "yes" if the points are in the same cluster and "no" otherwise. We study a natural generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Our aim is to determine the minimum number of queries needed for exactly recovering an arbitrary $k$-clustering. We focus on non-adaptive schemes, where all the queries are asked in one round, thus allowing for the querying process to be parallelized, which is a highly desirable property. For adaptive algorithms with pair-wise queries, the complexity is known to be $\Theta(nk)$, where $k$ is the number of clusters. In contrast, non-adaptive pair-wise query algorithms are extremely limited: even for $k=3$, such algorithms require $\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme. However, the realm of non-adaptive algorithms remains completely unknown. Is it possible to attain algorithms that are non-adaptive while still making a near-linear number of queries? In this paper, we give the first non-adaptive algorithms for clustering with subset queries. We provide, (i) a non-adaptive algorithm making $O(n \log^2 n \log k)$ queries which improves to $O(n \log k)$ when the cluster sizes are within any constant factor of each other, (ii) for constant $k$, a non-adaptive algorithm making $O(n \log{\log{n}})$ queries. In addition to non-adaptivity, we take into account other practical considerations, such as enforcing a bound on query size. For constant $k$, we give an algorithm making $\smash{\widetilde{O}(n^2/s^2)}$ queries on subsets of size at most $s \leq \sqrt{n}$, which is optimal among all non-adaptive algorithms within a $\log n$-factor. For arbitrary $k$, the dependence varies as $\tilde{O}(n^2/s)$.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Learning-Augmented Approximation Algorithms for Maximum Cut and Related Problems

Vincent Cohen-Addad
Tommaso d'Orsi
Anupam Gupta
Euiwoong Lee
Debmalya Panigrahi

In recent years, there has been a surge of interest in the use of machine-learned predictions to bypass worst-case lower bounds for classical problems in combinatorial optimization. So far, the focus has mostly been on online algorithms, where information-theoretic barriers are overcome using predictions about the unknown future. In this paper, we consider the complementary question of using learned information to overcome computational barriers in the form of approximation hardness of polynomial-time algorithms for NP-hard (offline) problems. We show that noisy predictions about the optimal solution can be used to break classical hardness results for maximization problems such as the max-cut problem and more generally, maximization versions of constraint satisfaction problems (CSPs).

PDF Details DOI

STOC Conference 2024 Conference Paper

Understanding the Cluster Linear Program for Correlation Clustering

Nairen Cao
Vincent Cohen-Addad
Euiwoong Lee
Shi Li 0001
Alantha Newman
Lukas Vogl

Details

SODA Conference 2023 Conference Paper

Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k -Median

Vincent Cohen-Addad
Fabrizio Grandoni 0001
Euiwoong Lee
Chris Schwiegelshohn

Details

FOCS Conference 2023 Conference Paper

Handling Correlated Rounding Error via Preclustering: A 1. 73-approximation for Correlation Clustering

Vincent Cohen-Addad
Euiwoong Lee
Shi Li 0001
Alantha Newman

We consider the classic correlation clustering problem: Given a complete graph where edges are labelled either + or −, the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the −edges within parts. Recently, Cohen-Addad, Lee and Newman [CLN22] gave a 1. 995-approximation for the problem using the Sherali-Adams hierarchy, hence beating the integrality gap of 2 of the classic linear program. We significantly improve upon this result by providing a 1. 73-approximation for the problem. Our approach brings together a new preprocessing of correlation clustering instances that enables a new LP formulation which combined with the algorithm from [CLN22] yields the improved bound.

Details

FOCS Conference 2023 Conference Paper

On Lifting Integrality Gaps to SSEH Hardness for Globally Constrained CSPs

Suprovat Ghoshal
Euiwoong Lee

A $\mu$-constrained Boolean MAX-CSP $(\psi)$ instance is a Boolean Max-CSP instance on predicate $\psi: \{0, 1\}^{r} \rightarrow\{0, 1\}$ where the objective is to find a labeling of relative weight exactly $\mu$ that maximizes the fraction of satisfied constraints. In this work, we study the approximability of constrained Boolean Max-CSPs via SDP hierarchies by relating the integrality gap of $\operatorname{Max}-\operatorname{CSP}(\psi)$ to its $\mu$-dependent approximation curve. Formally, assuming the Small-Set Expansion Hypothesis, we show that it is NP-hard to approximate $\mu$-constrained instances of $\operatorname{MAX}-\operatorname{CSP}(\psi)$ up to factor $\operatorname{Gap}_{\ell, \mu}(\psi) / \log (1 / \mu)^{2}$ (ignoring factors depending on r) for any $\ell \geq \ell(\mu, r)$. Here, $\operatorname{Gap}_{\ell, \mu}(\psi)$ is the optimal integrality gap of $\ell$-round Lasserre relaxation for $\mu$-constrained $\operatorname{MAX}-\operatorname{CSP}(\psi)$ instances. Our results are derived by combining the framework of Raghavendra [STOC 2008] along with more recent advances in rounding Lasserre relaxations and reductions from the Small-Set Expansion (SSE) problem. A crucial component of our reduction is a novel way of composing generic bias-dependent dictatorship tests with SSE, which could be of independent interest.

Details

STOC Conference 2022 Conference Paper

A characterization of approximability for biased CSPs

Euiwoong Lee
Suprovat Ghoshal

A µ-biased Max-CSP instance with predicate ψ:{0,1} r → {0,1} is an instance of Constraint Satisfaction Problem (CSP) where the objective is to find a labeling of relative weight at most µ which satisfies the maximum fraction of constraints. Biased CSPs are versatile and express several well studied problems such as Densest- k -Sub(Hyper)graph and SmallSetExpansion. In this work, we explore the role played by the bias parameter µ on the approximability of biased CSPs. We show that the approximability of such CSPs can be characterized (up to loss of factors in arity r ) using the bias-approximation curve of Densest- k -SubHypergraph (D k SH). In particular, this gives a tight characterization of predicates which admit approximation guarantees that are independent of the bias parameter µ. Motivated by the above, we give new approximation and hardness results for . In particular, assuming the Small Set Expansion Hypothesis (SSEH), we show that with arity r and k = µ n is NP-hard to approximate to a factor of Ω( r 3 µ r −1 log(1/µ)) for every r ≥ 2 and µ < 2 − r . We also give a O (µ r −1 log(1/µ))-approximation algorithm for the same setting. Our upper and lower bounds are tight up to constant factors, when the arity r is a constant, and in particular, imply the first tight approximation bounds for the Densest- k -Subgraph problem in the linear bias regime. Furthermore, using the above characterization, our results also imply matching algorithms and hardness for every biased CSP of constant arity.

Details

FOCS Conference 2022 Conference Paper

Correlation Clustering with Sherali-Adams

Vincent Cohen-Addad
Euiwoong Lee
Alantha Newman

Given a complete graph G=(V, E) where each edge is labeled + or −, the CORRELATION CLUSTERING problem asks to partition V into clusters to minimize the number of +edges between different clusters plus the number of −edges within the same cluster. CORRELATION CLUSTERING has been used to model a large number of clustering problems in practice, making it one of the most widely studied clustering formulations. The approximability of CORRELATION CLUSTERING has been actively investigated [BBC04], [CGW05], [ACN08], culminating in a 2. 06-approximation algorithm [CMSY15], based on rounding the standard LP relaxation. Since the integrality gap for this formulation is 2, it has remained a major open question to determine if the approximation factor of 2 can be reached, or even breached. In this paper, we answer this question affirmatively by showing that there exists a($1. 994+\varepsilon$)-approximation algorithm based on $O(1/\varepsilon^{2})$ rounds of the Sherali-Adams hierarchy. In order to round a solution to the Sherali-Adams relaxation, we adapt the correlated rounding originally developed for CSPs [BRSII], [GSII], [RT12]. With this tool, we reach an approximation ratio of $2+\varepsilon$ for CORRELATION CLUSTERING. To breach this ratio, we go beyond the traditional triangle-based analysis by employing a global charging scheme that amortizes the total cost of the rounding across different triangles.

Details

FOCS Conference 2022 Conference Paper

Fitting Metrics and Ultrametrics with Minimum Disagreements

Vincent Cohen-Addad
Chenglin Fan
Euiwoong Lee
Arnaud de Mesmay

Given $x\in(\mathbb{R}_{\geqslant 0})(_{2}^{[n]})$ recording pairwise distances, the Metric Violation Distance problem asks to compute the $\ell_{0}$ distance between x and the metric cone; i. e. , modify the minimum number of entries of x to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an $O(\log n)$-approximation algorithm for METRIC VIOLATION Distance, exponentially improving the previous best approximation ratio of $O(OPT^{1/3})$ of Fan, Raichel, and Van Buskirk [SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of Ultrametric Violation Distance, where the goal is to compute the $\ell_{0}$ distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The ULTRAMETRIC VIOLATION DISTANCE problem can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [FOCS, 2021] from $\ell_{1}$ norm to $\ell_{0}$ norm. We show that this problem can be favorably interpreted as an instance of CORRELATION CLUSTERING with an additional hierarchical structure, which we solve using a new $O(1)$-approximation algorithm for correlation clustering that has the structural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an $O(\log n\log\log n)$ approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of x forming an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.

Details

SODA Conference 2022 Conference Paper

Johnson Coverage Hypothesis: Inapproximability of k-means and k-median in ℓ p -metrics

Vincent Cohen-Addad
Karthik C. S. 0001
Euiwoong Lee

k -median and k -means are the two most popular objectives for clustering algorithms. Despite intensive effort, a good understanding of the approximability of these objectives, particularly in ℓ p -metrics, remains a major open problem. In this paper, we significantly improve upon the hardness of approximation factors known in literature for these objectives in ℓ p -metrics. We introduce a new hypothesis called the Johnson Coverage Hypothesis (JCH), which roughly asserts that the well-studied Max k -Coverage problem on set systems is hard to approximate to a factor greater than (1–1/ e ), even when the membership graph of the set system is a subgraph of the Johnson graph. We then show that together with generalizations of the embedding techniques introduced by Cohen-Addad and Karthik (FOCS '19), JCH implies hardness of approximation results for k -median and k -means in ℓ p -metrics for factors which are close to the ones obtained for general metrics. In particular, assuming JCH we show that it is hard to approximate the k -means objective: Discrete case: To a factor of 3. 94 in the ℓ 1 -metric and to a factor of 1. 73 in the ℓ 2 -metric; this improves upon the previous factor of 1. 56 and 1. 17 respectively, obtained under the Unique Games Conjecture (UGC). Continuous case: To a factor of 2. 10 in the ℓ 1 -metric and to a factor of 1. 36 in the ℓ 2 -metric; this improves upon the previous factor of 1. 07 in the ℓ 2 -metric obtained under UGC (and to the best of our knowledge, the continuous case of k -means in ℓ 1 -metric was not previously analyzed in literature). We also obtain similar improvements under JCH for the k -median objective. Additionally, we prove a weak version of JCH using the work of Dinur et al. (SICOMP ‘05) on Hypergraph Vertex Cover, and recover all the results stated above of Cohen-Addad and Karthik (FOCS ‘19) to (nearly) the same inapproximability factors but now under the standard NP ≠ P assumption (instead of UGC). Finally, we establish a strong connection between JCH and the long standing open problem of determining the Hypergraph Turán number. We then use this connection to prove improved SDP gaps (over the existing factors in literature) for k -means and k -median objectives.

Details

STOC Conference 2021 Conference Paper

A framework for quadratic form maximization over convex sets through nonconvex relaxations

Vijay Bhattiprolu
Euiwoong Lee
Assaf Naor

Details

SODA Conference 2021 Conference Paper

On Approximability of Clustering Problems Without Candidate Centers

Vincent Cohen-Addad
Karthik C. S. 0001
Euiwoong Lee

The k -means objective is arguably the most widely-used cost function for modeling clustering tasks in a metric space. In practice and historically, k -means is thought of in a continuous setting, namely where the centers can be located anywhere in the metric space. For example, the popular Lloyd's heuristic locates a center at the mean of each cluster. Despite persistent efforts on understanding the approximability of k -means, and other classic clustering problems such as k -median and k -minsum, our knowledge of the hardness of approximation factors of these problems remains quite poor. In this paper, we significantly improve upon the hardness of approximation factors known in the literature for these objectives. We show that if the input lies in a general metric space, it is NP-hard to approximate: • Continuous k -median to a factor of 2 – o (1); this improves upon the previous inapproximability factor of 1. 36 shown by Guha and Khuller (J. Algorithms '99). • Continuous k -means to a factor of 4– o (1); this improves upon the previous inapproximability factor of 2. 10 shown by Guha and Khuller (J. Algorithms '99). • k -minsum to a factor of 1. 415; this improves upon the APX-hardness shown by Guruswami and Indyk (SODA '03). Our results shed new and perhaps counter-intuitive light on the differences between clustering problems in the continuous setting versus the discrete setting (where the candidate centers are given as part of the input).

Details

SODA Conference 2021 Conference Paper

The Connectivity Threshold for Dense Graphs

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006

Consider a random graph model where there is an underlying simple graph G = ( V, E ), and each edge is sampled independently with probability p ∊ [0, 1]. What is the smallest value of p such that the resulting graph G p is connected with constant probability? This is a well-studied question for special classes of graphs, such as complete graphs and hypercubes. For instance, when G is the complete graph, we want the connectivity threshold for the Erdős-Rényi G ( n, p ) model: here the answer is known to be. However, the problem is not well-understood for more general graph classes. We first investigate this connectivity threshold problem for “somewhat dense” graphs. We show that for any and any δ -regular, δ -edge-connected graph G, the random graph G p for is connected with probability, generalizing upon the case when G is the complete graph. Our proof also bounds the number of approximate mincuts in such a dense graph, which may be of independent interest. Next, for a general graph G with edge connectivity λ, we define an explicit parameter β G ∊ (0, 2 ln n ], based on the number of approximate mincuts, and show that there is a sharp transition in the connectivity of G at p = 1 – exp( β G /λ ). Moreover, we show that the width of this transition is an additive O (ln λ/λ ) term; this improves upon Margulis' classical result bounding the width of the threshold by.

Details

STOC Conference 2020 Conference Paper

The Karger-Stein algorithm is optimal for k-cut

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006

Details

SODA Conference 2019 Conference Paper

A PTAS for ℓp-Low Rank Approximation

Frank Ban
Vijay Bhattiprolu
Karl Bringmann
Pavel Kolev
Euiwoong Lee
David P. Woodruff

A number of recent works have studied algorithms for entrywise ℓ p -low rank approximation, namely algorithms which given an n × d matrix A (with n ≥ d ), output a rank- k matrix B minimizing ‖ A – B ‖ p p = ∑ i, j |A i, j – B i, j | p when p > 0; and ‖ A – B ‖0 = ∑ i, j [ A i, j ≠ B i, j ] for p = 0, where [·] is the Iverson bracket, that is, ‖ A – B ‖ 0 denotes the number of entries ( i, j ) for which A i, j ≠ B i, j. For p = 1, this is often considered more robust than the SVD, while for p = 0 this corresponds to minimizing the number of disagreements, or robust PCA. This problem is known to be NP-hard for p ∊ {0, 1}, already for k = 1, and while there are polynomial time approximation algorithms, their approximation factor is at best poly( k ). It was left open if there was a polynomial-time approximation scheme (PTAS) for ℓ p -approximation for any p ≥ 0. We show the following: 1. On the algorithmic side, for p ∊ (0, 2), we give the first n poly( k / ε ) time (1 + ε )-approximation algorithm. For p = 0, there are various problem formulations, a common one being the binary setting in which A ∊ {0, 1} n × d and B = U · V, where U ∊ {0, 1} n × k and V ∊ {0, 1} k × d. There are also various notions of multiplication U · V, such as a matrix product over the reals, over a finite field, or over a Boolean semiring. We give the first almost-linear time approximation scheme for what we call the Generalized Binary ℓ 0 - Rank-k problem, for which these variants are special cases. Our algorithm computes (1 + ε )-approximation in time (1/ ε ) 2 O ( k ) / ε 2 · nd 1 + o (1), where o (1) hides a factor (log log d ) 1. 1 / log d. In addition, for the case of finite fields of constant size, we obtain an alternate PTAS running in time n · d poly( k / ε ). 2. On the hardness front, for p ∊ (1, 2), we show under the Small Set Expansion Hypothesis and Exponential Time Hypothesis (ETH), there is no constant factor approximation algorithm running in time 2 kδ for a constant δ > 0, showing an exponential dependence on k is necessary. For p = 0, we observe that there is no approximation algorithm for the Generalized Binary ℓ 0 -Rank- k problem running in time 2 2 δk for a constant δ > 0. We also show for finite fields of constant size, under the ETH, that any fixed constant factor approximation algorithm requires 2 kδ time for a constant δ > 0.

Details

SODA Conference 2019 Conference Paper

Approximability of p → q Matrix Norms: Generalized Krivine Rounding and Hypercontractive Hardness

Vijay Bhattiprolu
Mrinalkanti Ghosh
Venkatesan Guruswami
Euiwoong Lee
Madhur Tulsiani

We study the problem of computing the p → q operator norm of a matrix A in ℝ m × n, defined as ‖ A ‖ p → q: = sup x ∊ℝ n \{0} ‖ Ax ‖ q /‖ x ‖ p. This problem generalizes the spectral norm of a matrix ( p = q = 2) and the Grothendieck problem ( p = ∞, q = 1), and has been widely studied in various regimes. When p ≥ q, the problem exhibits a dichotomy: constant factor approximation algorithms are known if 2 is in [ q, p ], and the problem is hard to approximate within almost polynomial factors when 2 is not in [ q, p ]. For the case when 2 is in [ q, p ] we prove almost matching approximation and NP-hardness results. The regime when p < q, known as hypercontractive norms, is particularly significant for various applications but much less well understood. The case with p = 2 and q > 2 was studied by [Barak et. al. , STOC’12] who gave sub-exponential algorithms for a promise version of the problem (which captures small-set expansion) and also proved hardness of approximation results based on the Exponential Time Hypothesis. However, no NP-hardness of approximation is known for these problems for any p < q. We prove the first NP-hardness result for approximating hypercontractive norms. We show that for any 1 < p < q < ∞ with 2 not in [ p, q ], ‖ A ‖ p → q is hard to approximate within 2 O ( log 1−∊ n ) assuming NP is not contained in BPTIME(2 log O(1) n ) ).

Details

SODA Conference 2019 Conference Paper

Losing Treewidth by Separating Subsets

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006
Pasin Manurangsi
Michal Wlodarczyk 0001

We study the problem of deleting the smallest set S of vertices (resp. edges) from a given graph G such that the induced subgraph (resp. subgraph) G\S belongs to some class ℋ. We consider the case where graphs in ℋ have treewidth bounded by t, and give a general framework to obtain approximation algorithms for both vertex and edge-deletion settings from approximation algorithms for certain natural graph partitioning problems called k -S ubset V ertex S eparator and k -S ubset E dge S eparator, respectively. For the vertex deletion setting, our framework combined with the current best result for k -S ubset V ertex S eparator, improves approximation ratios for basic problems such as k -T reewidth V ertex D eletion and P lanar -ℱ V ertex D eletion. Our algorithms are simpler than previous works and give the first deterministic and uniform approximation algorithms under the natural parameterization. For the edge deletion setting, we give improved approximation algorithms for k -S ubset E dge S eparator combining ideas from LP relaxations and important separators. We present their applications in bounded-degree graphs, and also give an APX-hardness result for the edge deletion problems.

Details

STOC Conference 2019 Conference Paper

The number of minimum k -cuts: improving the Karger-Stein bound

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006

Given an edge-weighted graph, how many minimum k -cuts can it have? This is a fundamental question in the intersection of algorithms, extremal combinatorics, and graph theory. It is particularly interesting in that the best known bounds are algorithmic : they stem from algorithms that compute the minimum k -cut. In 1994, Karger and Stein obtained a randomized contraction algorithm that finds a minimum k -cut in O ( n (2− o (1)) k ) time. It can also enumerate all such k -cuts in the same running time, establishing a corresponding extremal bound of O ( n (2− o (1)) k ). Since then, the algorithmic side of the minimum k -cut problem has seen much progress, leading to a deterministic algorithm based on a tree packing result of Thorup, which enumerates all minimum k -cuts in the same asymptotic running time, and gives an alternate proof of the O ( n (2− o (1)) k ) bound. However, beating the Karger–Stein bound, even for computing a single minimum k -cut, has remained out of reach. In this paper, we give an algorithm to enumerate all minimum k -cuts in O ( n (1.981+ o (1)) k ) time, breaking the algorithmic and extremal barriers for enumerating minimum k -cuts. To obtain our result, we combine ideas from both the Karger–Stein and Thorup results, and draw a novel connection between minimum k -cut and extremal set theory . In particular, we give and use tighter bounds on the size of set systems with bounded dual VC-dimension, which may be of independent interest.

Details

SODA Conference 2018 Conference Paper

An FPT Algorithm Beating 2-Approximation for k -Cut

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006

In the k -CuT problem, we are given an edge-weighted graph G and an integer k, and have to remove a set of edges with minimum total weight so that G has at least k connected components. Prior work on this problem gives, for all h ∊ [2, k ], a (2 – h/k )-approximation algorithm for k -cut that runs in time n O ( h ). Hence to get a (2 – ε )-approximation algorithm for some absolute constant ε, the best runtime using prior techniques is n O ( kε ). Moreover, it was recently shown that getting a (2 – ε )-approximation for general k is NP-hard, assuming the Small Set Expansion Hypothesis. If we use the size of the cut as the parameter, an FPT algorithm to find the exact k -C ut is known, but solving the k -CuT problem exactly is W [1]-hard if we parameterize only by the natural parameter of k. An immediate question is: can we approximate k -C ut better in FPT-time, using k as the parameter? We answer this question positively. We show that for some absolute constant ε > 0, there exists a (– ε )-approximation algorithm that runs in time 2 O ( κ 6 ) · Õ ( n 4 ). This is the first FPT algorithm that is parameterized only by k and strictly improves the 2-approximation.

Details

FOCS Conference 2018 Conference Paper

Faster Exact and Approximate Algorithms for k-Cut

Anupam Gupta 0001
Euiwoong Lee
Jason Li 0006

In the k-cut problem, we are given an edge-weighted graph G and an integer k, and have to remove a set of edges with minimum total weight so that G has at least k connected components. The current best algorithms are an O(n (2-o(1))k ) randomized algorithm due to Karger and Stein, and an Õ(n 2k ) deterministic algorithm due to Thorup. Moreover, several 2-approximation algorithms are known for the problem (due to Saran and Vazirani, Naor and Rabani, and Ravi and Sinha). It has remained an open problem to (a) improve the runtime of exact algorithms, and (b) to get better approximation algorithms. In this paper we show an O(k O(k) n (2Ω/3 + o(1))k )-time algorithm for k-cut. Moreover, we show an (1+ε)-approximation algorithm that runs in time O((k/ε) O(k) n k + O(1) ), and a 1. 81-approximation in fixed-parameter time O(2 O(k(2)) poly(n)).

Details

SODA Conference 2017 Conference Paper

Partitioning a Graph into Small Pieces with Applications to Path Transversal

Euiwoong Lee

Details

FOCS Conference 2017 Conference Paper

Weak Decoupling, Polynomial Folds and Approximate Optimization over the Sphere

Vijay Bhattiprolu
Mrinalkanti Ghosh
Venkatesan Guruswami
Euiwoong Lee
Madhur Tulsiani

We consider the following basic problem: given an n-variate degree-d homogeneous polynomial f with real coefficients, compute a unit vector x in R̂n that maximizes abs(f(x)). Besides its fundamental nature, this problem arises in diverse contexts ranging from tensor and operator norms to graph expansion to quantum information theory. The homogeneous degree-2 case is efficiently solvable as it corresponds to computing the spectral norm of an associated matrix, but the higher degree case is NP-hard. We give approximation algorithms for this problem that offer a trade-off between the approximation ratio and running time: in n̂O(q) time, we get an approximation within factor (O(n)/q)̂(d/2-1) for arbitrary polynomials, (O(n)/q)̂(d/4-1/2) for polynomials with non-negative coefficients, and (m /q)̂(1/2) for sparse polynomials with m monomials. The approximation guarantees are with respect to the optimum of the level-q sum-of-squares (SoS) SDP relaxation of the problem (though our algorithms do not rely on actually solving the SDP). Known polynomial time algorithms for this problem rely on “decoupling lemmas. ” Such tools are not capable of offering a trade-off like our results as they blow up the number of variables by a factor equal to the degree. We develop new decoupling tools that are more efficient in the number of variables at the expense of less structure in the output polynomials. This enables us to harness the benefits of higher level SoS relaxations. Our decoupling methods also work with “folded polynomials, ” which are polynomials with polynomials as coefficients. This allows us to exploit easy substructures (such as quadratics) by considering them as coefficients in our algorithms. We complement our algorithmic results with some polynomially large integrality gaps for d-levels of the SoS relaxation. For general polynomials this follows from known results for random polynomials, which yield a gap of Omega(n)̂(d/4-1/2). For polynomials with non-negative coefficients, we prove an Omega(n̂(1/6) /polylogs) gap for the degree-4 case, based on a novel distribution of 4-uniform hypergraphs. We establish an n̂Omega(d) gap for general degree-d, albeit for a slightly weaker (but still very natural) relaxation. Toward this, we give a method to lift a level-4 solution matrix M to a higher level solution, under a mild technical condition on M. From a structural perspective, our work yields worst-case convergence results on the performance of the sum-of-squareshierarchy for polynomial optimization. Despite the popularity of SoS in this context, such results were previously only known for the case of q = Omega(n).

Details

IJCAI Conference 2017 Conference Paper

Why You Should Charge Your Friends for Borrowing Your Stuff

Kijung Shin
Euiwoong Lee
Dhivya Eswaran
Ariel D. Procaccia

We consider goods that can be shared with k-hop neighbors (i. e. , the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.

PDF Details

SODA Conference 2016 Conference Paper

Nearly Optimal NP-Hardness of Unique Coverage

Venkatesan Guruswami
Euiwoong Lee

Details

STOC Conference 2015 Conference Paper

Hardness of Graph Pricing Through Generalized Max-Dicut

Euiwoong Lee

The Graph Pricing problem is among the fundamental problems whose approximability is not well-understood. While there is a simple combinatorial 1/4-approximation algorithm, the best hardness result remains at 1/2 assuming the Unique Games Conjecture (UGC). We show that it is NP-hard to approximate within a factor better than 1/4 under the UGC, so that the simple combinatorial algorithm might be the best possible. We also prove that for any ε > 0, there exists δ > 0 such that the integrality gap of n δ -rounds of the Sherali-Adams hierarchy of linear programming for Graph Pricing is at most 1/4 + ε. This work is based on the effort to view the Graph Pricing problem as a Constraint Satisfaction Problem (CSP) simpler than the standard and complicated formulation. We propose the problem called Generalized Max-Dicut(T), which has a domain size T + 1 for every T ≥ 1. Generalized Max-Dicut(1) is well-known Max-Dicut. There is an approximation preserving reduction from Generalized Max-Dicut on directed acyclic graphs (DAGs) to Graph Pricing, and both our results are achieved through this reduction. Besides its connection to Graph Pricing, the hardness of Generalized Max-Dicut is interesting in its own right since in most arity two CSPs studied in the literature, SDP-based algorithms perform better than LP-based or combinatorial algorithms --- for this arity two CSP, a simple combinatorial algorithm does the best.

Details

SODA Conference 2015 Conference Paper

LP/SDP Hierarchy Lower Bounds for Decoding Random LDPC Codes

Badih Ghazi
Euiwoong Lee

Details

SODA Conference 2015 Conference Paper

Strong Inapproximability Results on Balanced Rainbow-Colorable Hypergraphs

Venkatesan Guruswami
Euiwoong Lee

Details

SODA Conference 2013 Conference Paper

Clustering Affine Subspaces: Hardness and Algorithms

Euiwoong Lee
Leonard J. Schulman

We study a generalization of the famous k -center problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ = 0) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in ℝ d can be modeled as affine subspaces. We give three algorithmic results for different values of k, under the assumption that all subspaces are axis-parallel, the main case of interest because of the correspondence to missing entries in data tables. 1) k = 1: Two polynomial time approximation schemes which runs in poly (Δ, 1/∊) nd. 2) k = 2: O (Δ 1/4 )-approximation algorithm which runs in poly ( n, d, Δ) 3) General k: Polynomial time approximation scheme which runs in We also prove nearly matching hardness results; in both the general (not necessarily axis-parallel) case (for k ≥ 2) and in the axis-parallel case (for k ≥ 3), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of k and Δ, unless P = NP. Furthermore, assuming that the 3-SAT problem cannot be solved sub-exponentially, the dependence on both k and Δ must be exponential in the general case (in the axis-parallel case, only the dependence on k drops to. The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axis-parallel and general case for k = 2, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Helly-type theorem.

Details