STOC Conference 2025 Conference Paper
Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time
- Talya Eden
- Reut Levi
- Dana Ron
- Ronitt Rubinfeld
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
STOC Conference 2025 Conference Paper
SODA Conference 2022 Conference Paper
We consider the problem of approximating the arboricity of a graph G = (V, E ), which we denote by arb( G ), in sublinear time, where the arboricity of a graph is the minimal number of forests required to cover its edge set. An algorithm for this problem may perform degree and neighbor queries, and is allowed a small error probability. We design an algorithm that outputs an estimate, such that with probability 1–1/poly(n), arb( G ) ≤ ≤ c log 2 n -arb( G ), where n = |V| and c is a constant. The expected query complexity and running time of the algorithm are O ( n /arb( G )) · poly(log n), and this upper bound also holds with high probability. This bound is optimal for such an approximation up to a poly (log n) factor. For the closely related problem of finding the densest subgraph, Bhattacharya et al. (STOC, 2015) showed that there exists a factor-2 approximation algorithm that runs in time O(n ) · poly (log n). In a follow up work, McGregor et al. (MFCS, 2015) improved the approximation factor to (1 + ∊ ) with the same complexity.
SODA Conference 2021 Conference Paper
A distance-approximation algorithm for a graph property P in the adjacency-matrix model is given an approximation parameter ∊ ∊ (0, 1) and query access to the adjacency matrix of a graph G = ( V, E ). It is required to output an estimate of the distance between G and the closest graph G′ = ( V, E′ ) that satisfies, where the distance between graphs is the size of the symmetric difference between their edge sets, normalized by | V| 2. In this work we introduce property covers, as a basis for a methodology that uses distance-approximation algorithms for “simple” properties to design distance-approximation algorithms for more “complex” properties. Applying this methodology we present distance-approximation algorithms with poly(1/ ∊ ) query complexity for induced P 3 -freeness, induced P 4 -freeness, and Chordality. For induced C 4 -freeness our algorithm has query complexity exp(poly(1/ ∊ )). These complexities essentially match the corresponding known results for testing these properties and provide an exponential improvement on previously known results.
SODA Conference 2021 Conference Paper
In this work, we study the problem of testing subsequence-freeness. For a given subsequence (word) w = w 1 … w k, a sequence (text) T = t 1 … t n is said to contain w if there exist indices 1 ≤ i 1 < ⃛ < i k ≤ n such that for every 1 ≤ j ≤ k. Otherwise, T is w -free. While a large majority of the research in property testing deals with algorithms that perform queries, here we consider sample-based testing (with one-sided error). In the “standard” sample-based model (i. e. , under the uniform distribution), the algorithm is given samples ( i, t i ) where i is distributed uniformly independently at random. The algorithm should distinguish between the case that T is w -free, and the case that T is ∊ -far from being w -free (i. e. , more than an ∊ -fraction of its symbols should be modified so as to make it w -free). Freitag, Price, and Swartworth (Proceedings of RANDOM, 2017) showed that O ( k 2 log k/∊ ) samples suffice for this testing task. We obtain the following results. • The number of samples sufficient for sample-based testing (under the uniform distribution) is O ( k/∊ ). This upper bound builds on a characterization that we present for the distance of a text T from w -freeness in terms of the maximum number of copies of w in T, where these copies should obey certain restrictions. • We prove a matching lower bound, which holds for every word w. This implies that the above upper bound is tight. • The same upper bound holds in the more general distribution-free sample-based model. In this model the algorithm receives samples ( i, t i ) where i is distributed according to an arbitrary distribution p (and the distance from w -freeness is measured with respect to p ). We highlight the fact that while we require that the testing algorithm work for every distribution and when only provided with samples, the complexity we get matches a known lower bound for a special case of the seemingly easier problem of testing subsequence-freeness under the uniform distribution and with queries (Canonne et al. , Theory of Computing, 2019).
SODA Conference 2020 Conference Paper
STOC Conference 2018 Conference Paper
We study the problem of approximating the number of k -cliques in a graph when given query access to the graph. We consider the standard query model for general graphs via (1) degree queries, (2) neighbor queries and (3) pair queries. Let n denote the number of vertices in the graph, m the number of edges, and C k the number of k -cliques. We design an algorithm that outputs a (1+ε)-approximation (with high probability) for C k , whose expected query complexity and running time are O ( n / C k 1/ k + m k /2 / C k )(log n , 1/ε, k ). Hence, the complexity of the algorithm is sublinear in the size of the graph for C k = ω( m k /2−1 ). Furthermore, we prove a lower bound showing that the query complexity of our algorithm is essentially optimal (up to the dependence on log n , 1/ε and k ). The previous results in this vein are by Feige (SICOMP 06) and by Goldreich and Ron (RSA 08) for edge counting ( k =2) and by Eden et al. (FOCS 2015) for triangle counting ( k =3). Our result matches the complexities of these results. The previous result by Eden et al. hinges on a certain amortization technique that works only for triangle counting, and does not generalize for larger cliques. We obtain a general algorithm that works for any k ≥ 3 by designing a procedure that samples each k -clique incident to a given set S of vertices with approximately equal probability. The primary difficulty is in finding cliques incident to purely high-degree vertices, since random sampling within neighbors has a low success probability. This is achieved by an algorithm that samples uniform random high degree vertices and a careful tradeoff between estimating cliques incident purely to high-degree vertices and those that include a low-degree vertex.
SODA Conference 2018 Conference Paper
FOCS Conference 2015 Conference Paper
We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a sublinear-time algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries. We show that for any given approximation parameter 0<; epsilon<; 1, the algorithm provides an estimate hat{t} such that with high constant probability, (1-epsilon) t<; hat{t}κ(1+epsilon)t, where t is the number of triangles in the graph G. The expected query complexity of the algorithm is O(n/t̂{1/3} + min {m, m̂{3/2}/t}) poly(log n, 1/epsilon), where n is the number of vertices in the graph and m is the number of edges, and the expected running time is (n/t̂{1/3} + m̂{3/2}/t) poly(log n, 1/epsilon). We also prove that Omega(n/t̂{1/3} + min {m, m̂{3/2}/t}) queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in n (and the dependence on 1/epsilon).
FOCS Conference 2014 Conference Paper
We initiate a study of learning and testing dynamic environments, focusing on environment that evolve according to a fixed local rule. The (proper) learning task consists of obtaining the initial configuration of the environment, whereas for non-proper learning it suffices to predict its future values. The testing task consists of checking whether the environment has indeed evolved from some initial configuration according to the known evolution rule. We focus on the temporal aspect of these computational problems, which is reflected in the requirement that only a small portion of the environment is inspected in each time slot (i. e. , the time period between two consecutive applications of the evolution rule). We present some general observations, an extensive study of two special cases, two separation results, and a host of open problems. The two special cases that we study refer to linear rules of evolution and to rules of evolution that represent simple movement of objects. Specifically, we show that evolution according to any linear rule can be tested within a total number of queries that is sublinear in the size of the environment, and that evolution according to a simple one-dimensional movement can be tested within a total number of queries that is independent of the size of the environment.
SODA Conference 2014 Conference Paper
SODA Conference 2013 Conference Paper
SODA Conference 2012 Conference Paper
TCS Journal 2011 Journal Article
SODA Conference 2010 Conference Paper
FOCS Conference 2010 Conference Paper
We initiate the study of testing properties of images that correspond to sparse 0/1-valued matrices of size n × n. Our study is related to but different from the study initiated by Raskhodnikova (Proceedings of RANDOM, 2003), where the images correspond to dense 0/1-valued matrices. Specifically, while distance between images in the model studied by Raskhodnikova is the fraction of entries on which the images differ taken with respect to all n 2 entries, the distance measure in our model is defined by the fraction of such entries taken with respect to the actual number of 1's in the matrix. We study several natural properties: connectivity, convexity, monotonicity, and being a line. In all cases we give testing algorithms with sublinear complexity, and in some of the cases we also provide corresponding lower bounds.
STOC Conference 2009 Conference Paper
We initiate a systematic study of a special type of property testers. These testers consist of repeating a basic test for a number of times that depends on the proximity parameter, whereas the basic test is oblivious of the proximity parameter. We refer to such basic tests by the term Proximity-Oblivious Testers. While proximity-oblivious testers were studied before - most notably in the algebraic setting - the current study seems to be the first one to focus on graph properties. We provide a mix of positive and negative results, and in particular characterizations of the graph properties that have constant-query proximity-oblivious testers in the two standard models (i.e., the adjacency matrix and the bounded-degree models). Furthermore, we show that constant-query proximity-oblivious testers do not exist for many easily testable properties, and that even when proximity-oblivious testers exist, repeating them does not necessarily yield the best standard testers for the corresponding property.
SODA Conference 2008 Conference Paper
TCS Journal 2007 Journal Article
FOCS Conference 2007 Conference Paper
We consider the problem of approximating the support size of a distribution from a small number of samples, when each element in the distribution appears with probability at least 1/n. This problem is closely related to the problem of approximating the number of distinct elements in a sequence of length n. For both problems, we prove a nearly linear in n lower bound on the query complexity, applicable even for approximation with additive error. At the heart of the lower bound is a construction of two positive integer random variables. X 1 and X 2, with very different expectations and the following condition on the first k moments: E[X 1 ]/E[X 2 ] = E[X 1 2 ]/E[X 2 2 ] =. .. = E[X 1 k ]/E[X 2 k ]. Our lower bound method is also applicable to other problems. In particular, it gives new lower bounds for the sample complexity of (1) approximating the entropy of a distribution and (2) approximating how well a given string is compressed by the Lempel-Ziv scheme.
SODA Conference 2006 Conference Paper
FOCS Conference 2004 Conference Paper
In this work we fill in the knowledge gap concerning testing polynomials over finite fields. As previous works show, when the cardinality of the field, q, is sufficiently larger than the degree bound, d, then the number of queries sufficient for testing is polynomial or even linear in d. On the other hand, when q = 2 then the number of queries, both sufficient and necessary, grows exponentially with d. Here we study the intermediate case where 2 1). Then the number of queries performed by the test grows like /spl lscr/ /spl middot/ q/sup 2/spl lscr/+1/, where /spl lscr/ = /spl lceil/(d+1)/((q-q)/p)/spl rceil/. Furthermore, q/sup /spl Omega/(/spl lscr/)/ queries are necessary when q /spl les/ O(d). The test itself provides a unifying view of the two extremes: it considers random affine subspaces of dimension /spl lscr/ and verifies that the function restricted to the selected subspaces is a degree d polynomial. Viewed in the context of coding theory, our result shows that Reed-Muller codes over general fields (usually referred to as generalized Reed-Muller (GRM) codes) are locally testable. In the course of our analysis we provide a characterization of small-weight words that span the code. Such a characterization was previously known only when the field size is a prime or is sufficiently large, in which case the minimum weight words span the code.
FOCS Conference 2002 Conference Paper
Motivated by a frequency assignment problem in cellular networks, we introduce and study a new coloring problem called minimum conflict-free coloring (min-CF-coloring). In its general form, the input of the min-CF-coloring problem is a set system (X, S), where each S /spl isin/ S is a subset of X. The output is a coloring X of the sets in S that satisfies the following constraint: for every x /spl isin/ X there exists a color i and a unique set S /spl isin/ S, such that x /spl isin/ S and /spl chi/(S) = i. The goal is to minimize the number of colors used by the coloring X. Min-CF-coloring of general set systems is not easier than the classic graph coloring problem. However, in view of our motivation, we consider set systems induced by simple geometric regions in the plane. In particular, we study disks (both congruent and non-congruent), axis-parallel rectangles (with a constant ratio between the smallest and largest rectangle) regular hexagons (with a constant ratio between the smallest and largest hexagon), and general congruent centrally-symmetric convex regions in the plane. In all cases we have coloring algorithms that use O(log n) colors (where n is the number of regions). For rectangles and hexagons we obtain a constant-ratio approximation algorithm when the ratio between the largest and smallest rectangle (hexagon) is a constant. We also show that, even in the case of unit disks, /spl Theta/(log n) colors may be necessary.
FOCS Conference 2002 Conference Paper
We show that a Boolean function over n Boolean variables can be tested for the property of depending on only k of them, using a number of queries that depends only on k and the approximation parameter /spl epsi/. We present two tests, both non-adaptive, that require a number of queries that is polynomial k and linear in /spl epsi//sup -1/. The first test is stronger in that it has a 1-sided error, while the second test has a more compact analysis. We also present an adaptive version and a 2-sided error version of the first test, that have a somewhat better query complexity than the other algorithms. We then provide a lower bound of /spl Omega//spl tilde/(/spl radic/ k) on the number of queries required for the non-adaptive testing of the above property; a lower bound of /spl Omega/(log(k + 1)) for adaptive algorithms naturally follows from this. In providing this we also prove a result about random walks on the group Z/sub 2//sup q/ that may be interesting in its own right. We show that for some t(q) = O/spl tilde/(q/sup 2/), the distributions of the random walk at times t and t + 2 are close to each other, independently of the step distribution of the walk. We also discuss related questions. In particular, when given in advance a known k junta function h, we show how to test a function f for the property of being identical to h up to a permutation of the variables, in a number of queries that is polynomial in k and /spl epsi/.
STOC Conference 2001 Conference Paper
Finite metric spaces, and in particular tree metrics play an important role in various disciplines such as evolutionary biology and statistics. A natural family of problems concerning metrics is deciding, given a matrix M , whether or not it is a distance metric of a certain predetermined type. Here we consider the following relaxed version of such decision problems: For any given matrix M and parameter \eps , we are interested in determining, by probing M , whether M has a particular metric property P , or whether it is ε far from having the property. In ε far we mean that more than an ε-fraction of the entries of M must be modified so that it obtains the property. The algorithm may query the matrix on entries M[i,j] of its choice, and is allowed a constant probability of error. We describe algorithms for testing Euclidean metrics, tree metrics and ultrametrics. Furthermore, we present an algorithm that tests whether a matrix M is an approximate ultrametric. In all cases the query complexity and running time are polynomial in 1 ε and independent of the size of the matrix. Finally, our algorithms can be used to solve relaxed versions of the corresponding search problems in time that is sub-linear in the size of the matrix.
FOCS Conference 2000 Conference Paper
A set X of points in /spl Rfr//sup d/ is (k, b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k, b)-clusterable and the case that X is /spl epsiv/-far from being (k, b')-clusterable for any given 0</spl epsiv//spl les/1 and for b'/spl ges/b. In /spl epsiv/-far from being (k, b')-clusterable we mean that more than /spl epsiv/. |X| points should be removed from X so that it becomes (k, b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1//spl epsiv/. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an /spl epsiv/-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.
STOC Conference 1999 Conference Paper
STOC Conference 1998 Conference Paper
FOCS Conference 1998 Conference Paper
We present a (randomized) test for monotonicity of Boolean functions. Namely, given the ability to query an unknown function f: {0, 1}/sup n/-{0, 1} at arguments of its choice, the test always accepts a monotone f, and rejects f with high probability if it is /spl epsiv/-far from being monotone (i. e. , every monotone function differs from f on more than an /spl epsiv/ fraction of the domain). The complexity of the test is poly(n//spl epsiv/). The analysis of our algorithm relates two natural combinatorial quantities that can be measured with respect to a Boolean function; one being global to the function and the other being local to it. We also consider the problem of testing monotonicity based only on random examples labeled by the function. We show an /spl Omega/(/spl radic/2/sup n///spl epsiv/) lower bound on the number of required examples, and provide a matching upper bound (via an algorithm).
STOC Conference 1998 Conference Paper
STOC Conference 1997 Conference Paper
FOCS Conference 1996 Conference Paper
The authors study the question of determining whether an unknown function has a particular property or is /spl epsiv/-far from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the function on instances of its choice. First, they establish some connections between property testing and problems in learning theory. Next, they focus on testing graph properties, and devise algorithms to test whether a graph has properties such as being k-colorable or having a /spl rho/-clique (clique of density /spl rho/ w. r. t. the vertex set). The graph property testing algorithms are probabilistic and make assertions which are correct with high probability utilizing only poly(1//spl epsiv/) edge-queries into the graph, where /spl epsiv/ is the distance parameter. Moreover, the property testing algorithms can be used to efficiently (i. e. , in time linear in the number of vertices) construct partitions of the graph which correspond to the property being tested, if it holds for the input graph.
FOCS Conference 1995 Conference Paper
We examine the problem of learning to play various games optimally against resource-bounded adversaries, with an explicit emphasis on the computational efficiency of the learning algorithm. We are especially interested in providing efficient algorithms for games other than penny-matching (in which payoff is received for matching the adversary's action in the current round), and for adversaries other than the classically studied finite automata. In particular, we examine games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithm. For example, in the game we call contract, both sides play O or 1 on each round, but our side receives payoff only if we play 1 in synchrony with the adversary; unlike penny-matching, playing O in synchrony with the adversary pays nothing. The name of the game is derived from the example of signing a contract, which becomes valid only if both parties sign (play 1).
STOC Conference 1995 Conference Paper
STOC Conference 1994 Conference Paper
STOC Conference 1993 Conference Paper
NeurIPS Conference 1993 Conference Paper
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that gen(cid: 173) erate the process, whereas on large scales, more syntactic and se(cid: 173) mantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long mem(cid: 173) ory models uniformly is not practical even for as short memory as four. The algorithm we propose is based on minimizing the sta(cid: 173) tistical prediction error by extending the memory, or state length, adaptively, until the total prediction error is sufficiently small. We demonstrate the algorithm by learning the structure of natural En(cid: 173) glish text and applying the learned model to the correction of cor(cid: 173) rupted text. Using less than 3000 states the model's performance is far superior to that of fixed memory models with similar num(cid: 173) ber of states. We also show how the algorithm can be applied to intergenic E. coli DNA base prediction with results comparable to HMM based methods.