Arrow Research search

Author name cluster

Christian Sohler

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

FOCS Conference 2022 Conference Paper

Motif Cut Sparsifiers

  • Michael Kapralov
  • Mikhail Makarov
  • Sandeep Silwal
  • Christian Sohler
  • Jakab Tardos

A motif is a frequently occurring subgraph of a given directed or undirected graph G (Milo et al.). Motifs capture higher order organizational structure of G beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few (Benson at al. , Tsourakakis at al.). In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G^{\prime}$ with only $\widetilde{O}\left(n / \epsilon^{2}\right)$ edges such that for every cut, the weighted number of copies of M crossing the cut in $G^{\prime}$ is within a $1+\epsilon$ factor of the number of copies of M crossing the cut in G, for every constant size motif M. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal works of Karger and Benczúr et al. to the motif setting. The task of adapting the importance sampling framework common to efficient graph sparsification algorithms to the motif setting turns out to be nontrivial due to the fact that cut sizes in a random subgraph of G depend non-linearly on the sampled edges. To overcome this, we adopt the viewpoint of hypergraph sparsification to define edge sampling probabilities which are derived from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in G to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs 1. 1 The full version of the paper is found at https: //arxiv. org/abs/2204. 09951

NeurIPS Conference 2021 Conference Paper

Parallel and Efficient Hierarchical k-Median Clustering

  • Vincent Cohen-Addad
  • Silvio Lattanzi
  • Ashkan Norouzi-Fard
  • Christian Sohler
  • Ola Svensson

As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical $k$-center, $k$-means, and $k$-median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efficient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution. Furthermore, we also get that for all $k$ simultanuously the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta \log (\Delta d n))$ factor bigger that the corresponding optimal solution. The algorithm requires in $O\left(\log_{s} (nd\log(n+\Delta))\right)$ rounds. Here $d$ is the dimension of the data set and $\Delta$ is the ratio between the maximum and minimum distance of two points in the input dataset. To the best of our knowledge, this is the first \emph{parallel} algorithm for the hierarchical $k$-median problem with theoretical guarantees. We further complement our theoretical results with an empirical study of our algorithm that shows its effectiveness in practice.

SODA Conference 2021 Conference Paper

Spectral Clustering Oracles in Sublinear Time

  • Grzegorz Gluch
  • Michael Kapralov
  • Silvio Lattanzi
  • Aida Mousavifar
  • Christian Sohler

Given a graph G that can be partitioned into k disjoint expanders with outer conductance upper bounded by ∊ « 1, can we efficiently construct a small space data structure that allows quickly classifying vertices of G according to the expander (cluster) they belong to? Formally, we would like an efficient local computation algorithm that misclassifies at most an O (∊) fraction of vertices in every expander. We refer to such a data structure as a spectral clustering oracle. Our main result is a spectral clustering oracle with query time O ∗( n 1/2+ O (∊) ) and preprocessing time that provides misclassification error O (∊ log k ) per cluster for any ∊ « 1/log k. More generally, query time can be reduced at the expense of increasing the preprocessing time appropriately (as long as the product is about n 1+ O (∊) ) – this in particular gives a nearly linear time spectral clustering primitive. The main technical contribution is a sublinear time oracle that provides dot product access to the spectral embedding of G by estimating distributions of short random walks from vertices in G. The distributions themselves provide a poor approximation to the spectral embedding, but we show that an appropriate linear transformation can be used to achieve high precision dot product access. We give an estimator for this linear transformation and analyze it using spectral perturbation bounds and a novel upper bound on the leverage scores of the spectral embedding matrix of a k -clusterable graph. We then show that dot product access to the spectral embedding is sufficient to design a clustering oracle. At a high level our approach amounts to hyperplane partitioning in the spectral embedding of G, but crucially operates on a nested sequence of carefully defined subspaces in the spectral embedding to achieve per cluster recovery guarantees.

NeurIPS Conference 2020 Conference Paper

Fast and Accurate $k$-means++ via Rejection Sampling

  • Vincent Cohen-Addad
  • Silvio Lattanzi
  • Ashkan Norouzi-Fard
  • Christian Sohler
  • Ola Svensson

$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present such a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.

ICML Conference 2019 Conference Paper

A Better k-means++ Algorithm via Local Search

  • Silvio Lattanzi
  • Christian Sohler

In this paper, we develop a new variant of k-means++ seeding that in expectation achieves a constant approximation guarantee. We obtain this result by a simple combination of k-means++ sampling with a local search strategy. We evaluate our algorithm empirically and show that it also improves the quality of a solution in practice.

FOCS Conference 2019 Conference Paper

A Characterization of Graph Properties Testable for General Planar Graphs with one-Sided Error (It's all About Forbidden Subgraphs)

  • Artur Czumaj
  • Christian Sohler

The problem of characterizing testable graph properties (properties that can be tested with a number of queries independent of the input size) is a fundamental problem in the area of property testing. While there has been some extensive prior research characterizing testable graph properties in the dense graphs model and we have good understanding of the bounded degree graphs model, no similar characterization has been known for general graphs, with no degree bounds. In this paper we take on this major challenge and consider the problem of characterizing all testable graph properties in general planar graphs. We consider the model in which a general planar graph can be accessed by the random neighbor oracle that allows access to any given vertex and access to a random neighbor of a given vertex. We show that, informally, a graph property P is testable with one-sided error for general planar graphs if and only if testing P can be reduced to testing for a finite family of finite forbidden subgraphs. While our presentation focuses on planar graphs, our approach extends easily to general minor-free graphs. Our analysis of the necessary condition relies on a recent construction of canonical testers in the random neighbor oracle model that is applied here to the one-sided error model for testing in planar graphs. The sufficient condition in the characterization reduces the problem to the task of testing H-freeness in planar graphs, and is the main and most challenging technical contribution of the paper: we show that for planar graphs (with arbitrary degrees), the property of being H-free is testable with one-sided error for every finite graph H, in the random neighbor oracle model.

SODA Conference 2019 Conference Paper

Every Testable (Infinite) Property of Bounded-Degree Graphs Contains an Infinite Hyperfinite Subproperty

  • Hendrik Fichtenberger
  • Pan Peng 0001
  • Christian Sohler

One of the most fundamental questions in graph property testing is to characterize the combinatorial structure of properties that are testable with a constant number of queries. We work towards an answer to this question for the bounded-degree graph model introduced in [GR02], where the input graphs have maximum degree bounded by a constant d. In this model, it is known (among other results) that every hyperfinite property is constant-query testable [NS13], where, informally, a graph property is hyperfinite, if for every δ > 0 every graph in the property can be partitioned into small connected components by removing δn edges. In this paper we show that hyperfiniteness plays a role in every testable property, i. e. we show that every testable property is either finite (which trivially implies hyperfiniteness and testability) or contains an infinite hyperfinite subproperty. A simple consequence of our result is that no infinite graph property that only consists of expander graphs is constant-query testable. Based on the above findings, one could ask if every infinite testable non-hyperfinite property might contain an infinite family of expander (or near-expander) graphs. We show that this is not true. Motivated by our counterexample we develop a theorem that shows that we can partition the set of vertices of every bounded degree graph into a constant number of subsets and a separator set, such that the separator set is small and the distribution of k -discs on every subset of a partition class, is roughly the same as that of the partition class if the subset has small expansion.

IJCAI Conference 2018 Conference Paper

A Property Testing Framework for the Theoretical Expressivity of Graph Kernels

  • Nils M. Kriege
  • Christopher Morris
  • Anja Rey
  • Christian Sohler

Graph kernels are applied heavily for the classification of structured data. However, their expressivity is assessed almost exclusively from experimental studies and there is no theoretical justification why one kernel is in general preferable over another. We introduce a theoretical framework for investigating the expressive power of graph kernels, which is inspired by concepts from the area of property testing. We introduce the notion of distinguishability of a graph property by a graph kernel. For several established graph kernels we show that they cannot distinguish essential graph properties. In order to overcome this, we consider a kernel based on k-disc frequencies. We show that this efficiently computable kernel can distinguish fundamental graph properties. Finally, we obtain learning guarantees for nearest neighbor classifiers in our framework.

SODA Conference 2018 Conference Paper

Estimating Graph Parameters from Random Order Streams

  • Pan Peng 0001
  • Christian Sohler

We develop a new algorithmic technique that allows to transfer some constant time approximation algorithms for general graphs into random order streaming algorithms. We illustrate our technique by proving that in random order streams with probability at least 2/3, • the number of connected components of G can be approximated up to an additive error of εn using space, • the weight of a minimum spanning tree of a connected input graph with integer edges weights from {1, …, W } can be approximated within a multiplicative factor of 1 + ε using space, • the size of a maximum independent set in planar graphs can be approximated within a multiplicative factor of 1+ ε using space.

NeurIPS Conference 2018 Conference Paper

On Coresets for Logistic Regression

  • Alexander Munteanu
  • Chris Schwiegelshohn
  • Christian Sohler
  • David Woodruff

Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure $\mu(X)$, which quantifies the hardness of compressing a data set for logistic regression. $\mu(X)$ has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded $\mu(X)$-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear $(1\pm\eps)$-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression.

FOCS Conference 2018 Conference Paper

Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension

  • Christian Sohler
  • David P. Woodruff

We obtain the first strong coresets for the k-median and subspace approximation problems with sum of distances objective function, on n points in d dimensions, with a number of weighted points that is independent of both n and d; namely, our coresets have size poly(k/ε). A strong coreset (1+ε)-approximates the cost function for all possible sets of centers simultaneously. We also give efficient nnz(A) + (n+d) poly(k/ε) + exp(poly(k/ε)) time algorithms for computing these coresets. We obtain the result by introducing a new dimensionality reduction technique for coresets that significantly generalizes an earlier result of Feldman, Sohler and Schmidt [FSS13] for squared Euclidean distances to sums of P-th powers of Euclidean distances for constant p≥1.

ICML Conference 2017 Conference Paper

Clustering High Dimensional Dynamic Data Streams

  • Vladimir Braverman
  • Gereon Frahling
  • Harry Lang
  • Christian Sohler
  • Lin F. Yang

We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i. e. streams allowing both insertions and deletions of points from a discrete Euclidean space $\{1, 2, \ldots \Delta\}^d$. Our algorithms use $k \epsilon^{-2} \mathrm{poly}(d \log \Delta)$ space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of $k$ centers the cost of the coreset $(1+\epsilon)$-approximates the cost of the streamed point set. We also provide algorithms that guarantee only positive weights in the coreset with additional logarithmic factors in the space and time complexities. We can use this positively-weighted coreset to compute a $(1+\epsilon)$-approximation for the $k$-median problem by any efficient offline $k$-median algorithm. All previous algorithms for computing a $(1+\epsilon)$-approximation for the $k$-median problem over dynamic data streams required space and time exponential in $d$. Our algorithms can be generalized to metric spaces of bounded doubling dimension.

SODA Conference 2017 Conference Paper

Testing for Forbidden Order Patterns in an Array

  • Ilan Newman
  • Yuri Rabinovich
  • Deepak Rajendraprasad
  • Christian Sohler

In this paper, we study testing of sequence properties that are defined by forbidden order patterns. A sequence f: {1, …, n} → ℝ of length n contains a pattern is the group of permutations of k elements), iff there are indices i 1 < i 2 < · · · < i k, such that f (i x ) > f (i y ) whenever π(χ) > π( y ). If f does not contain π, we say f is π-free. For example, for π = (2, 1), the property of being π-free is equivalent to being non-decreasing, i. e. monotone. The property of being ( k, k — 1, …, 1)-free is equivalent to the property of having a partition into at most k - 1 non-decreasing subsequences. Let k constant, be a (forbidde n ) pattern. Assuming f is stored in an array, we consider the property testing problem of distinguishing the case that f is π-free from the case that f differs in more than en places from any π-free sequence. We show the following results: There is a clear dichotomy between the monotone patterns and the non-monotone ones: For monotone patterns of length k, i. e. , ( k, k - 1, …, 1) and (1, 2, …, k ), we design non-adaptive one-sided error ε-tests of (∊ −1 log n ) O ( k 2 ) query complexity. For non-monotone patterns, we show that for any size- k non-monotone π, any non-adaptive one-sided error ε-test requires at least Ω(γ / η) queries. This general lower bound can be further strengthened for specific non-monotone k -length patterns to Ω( n 1–2/( k +1) ). On the other hand, there always exists a non- adaptive one-sided error ε-test for with O(e −1/k n 1–1/k ) query complexity Again, this general upper bound can be further strengthened for specific non-monotone patterns. E. g. , for π = (1, 3, 2), we describe an ε-test with (almost tight) query complexity of Finally, we show that adaptivity can make a big difference in testing non-monotone patterns, and develop an adaptive algorithm that for any tests π-freeness by making (∊ −1 logn) O(1) queries. For all algorithms presented here, the running times are linear in their query complexity.

SODA Conference 2016 Conference Paper

Clustering time series under the Fréchet distance

  • Anne Driemel
  • Amer Krivosija
  • Christian Sohler

The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give (1 + ∊ )-approximation algorithms for variations of the following problem with parameters k and ℓ. Given n univariate time series P, each of complexity at most m, we find k time series, not necessarily from P, which we call cluster centers and which each have complexity at most ℓ, such that (a) the maximum distance of an element of P to its nearest cluster center or (b) the sum of these distances is minimized. Our algorithms have running time near-linear in the input size for constant ∊, k and ℓ. To the best of our knowledge, our algorithms are the first clustering algorithms for the Fréchet distance which achieve an approximation factor of (1 + ∊ ) or better.

STOC Conference 2016 Conference Paper

Relating two property testing models for bounded degree directed graphs

  • Artur Czumaj
  • Pan Peng 0001
  • Christian Sohler

We study property testing algorithms in directed graphs (digraphs) with maximum indegree and maximum outdegree upper bounded by d . For directed graphs with bounded degree, there are two different models in property testing introduced by Bender and Ron (2002). In the bidirectional model , one can access both incoming and outgoing edges while in the unidirectional model one can only access outgoing edges. In our paper we provide a new relation between the two models: we prove that if a property can be tested with constant query complexity in the bidirectional model, then it can be tested with sublinear query complexity in the unidirectional model. A corollary of this result is that in the unidirectional model (the model allowing only queries to the outgoing neighbors), every property in hyperfinite digraphs is testable with sublinear query complexity.

SODA Conference 2013 Conference Paper

(1+ Є)-approximation for facility location in data streams

  • Artur Czumaj
  • Christiane Lammersen
  • Morteza Monemizadeh
  • Christian Sohler

We consider the Euclidean facility location problem with uniform opening cost. In this problem, we are given a set of n points P sube ℝ 2 and an opening cost f ∊ ℝ +, and we want to find a set of facilities F ⊆ ℝ 2 that minimizes where d ( p, q ) is the Euclidean distance between p and q. We obtain two main results: A (1 + ε)-approximation algorithm with running time which is ( n log 2 n log log n ) for any constant ε. The first (1 + ε)-approximation algorithm for the cost of the facility location problem for dynamic geometric data streams, i. e. , when the stream consists of insert and delete operations of points from a discrete space {1, …, Δ} 2. The streaming algorithm uses space. Our PTAS is significantly faster than any previously known (1 + ε)-approximation algorithm for the problem, and is also relatively simple. Our algorithm for dynamic geometric data streams is the first (1 + ε)-approximation algorithm for the cost of the facility location problem with polylogarithmic space, and it resolves an open problem in the streaming area. Both algorithms are based on a novel and simple decomposition of an input point set P into small subsets P i, such that: the cost of solving the facility location problem for each P i is small (which means that for each P i one needs to open only a small, polylogarithmic number of facilities), Σ i OPT( P i ) ≤ (1 + ε) · OPT( P ), where for a point set P, OPT( P ) denotes the cost of an optimal solution for P. The decomposition can be used directly to obtain the PTAS by splitting the point set in the subsets and efficiently solve the problem for each subset independently. By combining our partitioning with techniques to process dynamic data streams of sampling from the cells of the partition and estimating the cost from the sample, we obtain our data streaming algorithm.

FOCS Conference 2012 Conference Paper

Almost Optimal Canonical Property Testers for Satisfiability

  • Christian Sohler

In the (k, d)-Function-SAT problem we are given a set of n variables {X 1, .. ., X n } that can take values from the set {1, .. ., d} and a set of Boolean constraints on these variables, where each constraint is of the form f: {1, .. ., d} k → {0, 1}, i. e. the constraint depends on exactly k of these variables. We will treat k and d as constants. The goal is to determine whether the set of constraints has a satisfying assignment, i. e. an assignment to the variables such that all constraints simultanuously map to 1. In this paper, we study (k, d)-Function-SAT in the property testing model for dense instances. We call an instance ε-far from satisfiable, if every assignment violates more than εn k constraints. A property testing algorithm is a randomized algorithm that, given oracle access to the set of constraints, must accept with probability at least 3/4 all satisfiable inputs and rejects with probability at least 3/4 all inputs, which are ε-far from satisfiable. We analyze the canonical non-adaptive property testing algorithm with one-sided error: Sample r variables and accept, if and only if the induced set of constraints has a satisfying assignment. The value of r will be called the sample commlexity of the algorithm. We show that there is an r 0 = O(1/ε) such that for any instance that is ε-far from satisfiable, the probability, that a random sample on r ≥ r0 variables is satisfiable, is at most 1/4. This implies that the above algorithm is a property tester. The obtained sample complexity is nearly optimal for canonical testers as a lower bound of Ω(1/ε) on the sample complexity is known. Previously, a tester with sample complexity o(1/ε 2 ) was only known for the very special case of testing bipartiteness in the dense graph model [3]. Our new general result improves the best previous result for testing satisfiability (and even for the special case of 3-colorability in graphs) from sample complexity Õ(1/ε2) to Õ(1/ε). It also slightly improves the sample complexity for the special case of bipartiteness. Improving the sample complexity for (k, d)-Function-SAT (or special cases of it) had been posed in several papers as an open problem [3], [4], [17]. This paper solves this problem nearly optimally for canonical testers and, in the case of k = 2, also for nonadaptive testers as there is a lower bound of Ω(1/ε 2 ) on the query complexity of non-adaptive testers for bipartiteness in the dense graph model [6], where the query complexity denotes the number of queries asked about the graph (for a canonical tester in graphs, the query complexity is the square of its sample complexity). As a byproduct, we obtain an algorithm, which, given a satisfiable set of constraints, computes in time O(n/ε O(1) + 2 Õ(1/ε) ) a solution, which violates at most εn k constraints.

FOCS Conference 2011 Conference Paper

Planar Graphs: Random Walks and Bipartiteness Testing

  • Artur Czumaj
  • Morteza Monemizadeh
  • Krzysztof Onak
  • Christian Sohler

We initiate the study of the testability of properties in arbitrary planar graphs. We prove that bipartiteness can be tested in constant time. The previous bound for this class of graphs was O(√n), and the constant-time testability was only known for planar graphs with bounded degree. Previously used transformations of unbounded-degree sparse graphs into bounded- degree sparse graphs cannot be used to reduce the problem to the testability of bounded-degree planar graphs. Our approach extends to arbitrary minor-free graphs. Our algorithm is based on random walks. The challenge here is to analyze random walks for a class of graphs that has good separators, i. e. , bad expansion. Standard techniques that use a fast convergence to a uniform distribution do not work in this case. Roughly speaking, our analysis technique self-reduces the problem of finding an odd-length cycle in a multigraph G induced by a collection of cycles to another multigraph G' induced by a set of shorter odd-length cycles, in such a way that when a random walks finds a cycle in G' with probability p >; 0, then it does so with probability λ(p) >; 0 in G. This reduction is applied until the cycles collapse to self-loops that can be easily detected.

SODA Conference 2010 Conference Paper

Coresets and Sketches for High Dimensional Subspace Approximation Problems

  • Dan Feldman
  • Morteza Monemizadeh
  • Christian Sohler
  • David P. Woodruff

We consider the problem of approximating a set P of n points in ℝ d by a j-dimensional subspace under the ℓ p measure, in which we wish to minimize the sum of ℓ p distances from each point of P to this subspace. More generally, the F q (ℓ p )-subspace approximation problem asks for a j-subspace that minimizes the sum of qth powers of ℓ p -distances to this subspace, up to a multiplicative factor of (1 + ε). We develop techniques for subspace approximation, regression, and matrix approximation that can be used to deal with massive data sets in high dimensional spaces. In particular, we develop coresets and sketches, i. e. small space representations that approximate the input point set P with respect to the subspace approximation problem. Our results are: A dimensionality reduction method that can be applied to F q (ℓ p )-clustering and shape fitting problems, such as those in [8, 15]. The first strong coreset for F 1 (ℓ 2 )-subspace approximation in high-dimensional spaces, i. e. of size polynomial in the dimension of the space. This coreset approximates the distances to any j-subspace (not just the optimal one). A (1 + ε)-approximation algorithm for the j-dimensional F 1 (ℓ 2 )-subspace approximation problem with running time nd(j/ε) O(1) + (n + d)2 poly(j/ε). A streaming algorithm that maintains a coreset for the F 1 (ℓ 2 )-subspace approximation problem and uses a space of (weighted) points. Streaming algorithms for the above problems with bounded precision in the turnstile model, i. e, when coordinates appear in an arbitrary order and undergo multiple updates. We show that bounded precision can lead to further improvements. We extend results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation, to error measures other than the Frobenius norm.

TCS Journal 2009 Journal Article

A sublinear-time approximation scheme for bin packing

  • Tuğkan Batu
  • Petra Berenbrink
  • Christian Sohler

The bin packing problem is defined as follows: given a set of n items with sizes 0 < w 1, w 2, …, w n ≤ 1, find a packing of these items into a minimum number of unit-size bins possible. We present a sublinear-time asymptotic approximation scheme for the bin packing problem; that is, for any ϵ > 0, we present an algorithm A ϵ that has sampling access to the input instance and outputs a value k such that C opt ≤ k ≤ ( 1 + ϵ ) ⋅ C opt + 1, where C opt is the cost of an optimal solution. It is clear that uniform sampling by itself will not allow a sublinear-time algorithm in this setting; a small number of items might constitute most of the total weight and uniform samples will not hit them. In this work we use weighted samples, where item i is sampled with probability proportional to its weight: that is, with probability w i / ∑ i w i. In the presence of weighted samples, the approximation algorithm runs in O ̃ ( n ⋅ poly ( 1 / ϵ ) ) + g ( 1 / ϵ ) time, where g ( x ) is an exponential function of x. When both weighted sampling and uniform sampling are allowed, O ̃ ( n 1 / 3 ⋅ poly ( 1 / ϵ ) ) + g ( 1 / ϵ ) time suffices. In addition to an approximate value to C opt, our algorithm can also output a constant-size “template” of a packing that can later be used to find a near-optimal packing in linear time.

FOCS Conference 2007 Conference Paper

Testing Expansion in Bounded-Degree Graphs

  • Artur Czumaj
  • Christian Sohler

We consider the problem of testing expansion in bounded degree graphs. We focus on the notion of vertex-expansion: an alpha-expander is a graph G = (V, E) in which even-subset U sube V of at most |V|/2 vertices has a neighborhood of size at least alphaldr|U|. Our main result is that one can distinguish good expanders from graphs that are far from being weak expanders in time O tilde(radicn). We prove that the property testing algorithm proposed by Goldreich and Ron (2000) with appropriately set parameters accepts every alpha-expander with probability at least 2/3 and rejects every graph that is epsiv-far from an alpha*-expander with probability at least 2/3, where alpha*=Theta(alpha 2 /(d 2 log (n/epsiv))) and d is the maximum degree of the graphs. The algorithm assumes the bounded-degree graphs model with adjacency list graph representation and its running time is O(d 2 (radicn log (n/epsiv))/alpha 2 epsiv 3 ).

TCS Journal 2005 Journal Article

Testing hypergraph colorability

  • Artur Czumaj
  • Christian Sohler

We study the problem of testing properties of hypergraphs. The goal of property testing is to distinguish between the case whether a given object has a certain property or is “far away” from the property. We prove that the fundamental problem of ℓ -colorability of k-uniform hypergraphs can be tested in time independent of the size of the hypergraph. We present a testing algorithm that examines only ( k ℓ / ε ) O ( k ) entries of the adjacency matrix of the input hypergraph, where ε is a distance parameter independent of the size of the hypergraph. The algorithm tests only a constant number of entries in the adjacency matrix provided that ℓ, k, and ε are constants. This result is a generalization of previous results about testing graph colorability.

FOCS Conference 2002 Conference Paper

Abstract Combinatorial Programs and Efficient Property Testers

  • Artur Czumaj
  • Christian Sohler

Property testing is a relaxation of classical decision problems which aims at distinguishing between functions having a predetermined property and functions being far from any function having the property. In this paper we present a novel framework for analyzing property testing algorithms with one-sided error. Our framework is based on a connection of property testing and a new class of problems which we call abstract combinatorial programs. We show that if the problem of testing a property can be reduced to an abstract combinatorial program of small dimension, then the property has an efficient tester. We apply our framework to a variety of classical combinatorial problems. Among others, we present efficient property testing algorithms for geometric clustering problems, the reversal distance problem, and graph and hypergraph coloring problems. We also prove that, informally, any hereditary graph property can be efficiently tested if and only if it can be reduced to an abstract combinatorial program of small size. Our framework allows us to analyze all our testers in a unified way and the obtained complexity bounds either match or improve the previously known bounds. We believe that our framework will help to better understand the structure of efficiently testable properties.