STOC Conference 2025 Conference Paper
Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time
- Talya Eden
- Reut Levi
- Dana Ron
- Ronitt Rubinfeld
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
STOC Conference 2025 Conference Paper
STOC Conference 2025 Conference Paper
NeurIPS Conference 2024 Conference Paper
We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor’s quality, measured by its total variation distance from $p$. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimation’s accuracy (i. e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i. e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach.
STOC Conference 2023 Conference Paper
NeurIPS Conference 2022 Conference Paper
Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$. We present an improved simulation of the WL test on GNNs with {\em exponentially} lower complexity. In particular, the neural network implementing the combine function in each node has only $\mathrm{polylog}(n)$ parameters, and the feature vectors exchanged by the nodes of GNN consists of only $O(\log n)$ bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.
FOCS Conference 2022 Conference Paper
We give a $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$-time algorithm for properly learning monotone Boolean functions under the uniform distribution over $\{0, 1\}^{n}$. Our algorithm is robust to adversarial label noise and has a running time nearly matching that of the state-of-the-art improper learning algorithm of Bshouty and Tamon (JACM 96) and an information-theoretic lower bound of Blais et al (RANDOM ’15). Prior to this work, no proper learning algorithm with running time smaller than $2^{\Omega(n)}$ was known to exist. The core of our proper learner is a local computation algorithm for sorting binary labels on a poset. Our algorithm is built on a body of work on distributed greedy graph algorithms; specifically we rely on a recent work of Ghaffari (FOCS’22), which gives an efficient algorithm for computing maximal matchings in a graph in the LCA model of Rubinfeld et al and Alon et al (ICS’II, SODA’12). The applications of our local sorting algorithm extend beyond learning on the Boolean cube: we also give a tolerant tester for Boolean functions over general posets that distinguishes functions that are $\varepsilon$/3-close to monotone from those that are $\varepsilon-$far. Previous tolerant testers for the Boolean cube only distinguished between $\varepsilon/\Omega(\sqrt{n}$)-close and $\varepsilon-$far.
ICLR Conference 2022 Conference Paper
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, Hsu et al. (2019) and Jiang et al. (2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior “classical” algorithms that did not use oracles. In this paper, we explore the power of a “heavy edge” oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on “classical” streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.
ICLR Conference 2021 Conference Paper
We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $ \pm \varepsilon n$ from a sample of size $O(\log^2(1/\varepsilon) \cdot n/\log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to $$ \ \log (1/\varepsilon) \cdot n^{1-\Theta(1/\log(1/\varepsilon))}. $$ We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR'19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.
SODA Conference 2020 Conference Paper
NeurIPS Conference 2019 Conference Paper
Statistical tests are at the heart of many scientific tasks. To validate their hypothesis, researchers in medical and social sciences use individuals' data. The sensitivity of participants' data requires the design of statistical tests that ensure the privacy of the individuals in the most efficient way. In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy. In particular, we investigate testing two fundamental properties of distributions: (1) testing the equivalence of two distributions when we have unequal numbers of samples from the two distributions. (2) Testing independence of two random variables. In both cases, we show that our testers achieve near optimal sample complexity (up to logarithmic factors). Moreover, our dependence on the privacy parameter is an additive term, which indicates that differential privacy can be obtained in most regimes of parameters for free.
ICML Conference 2018 Conference Paper
We study the fundamental problems of identity and equivalence testing over a discrete population from random samples. Our goal is to develop efficient testers while guaranteeing differential privacy to the individuals of the population. We provide sample-efficient differentially private testers for these problems. Our theoretical results significantly improve over the best known algorithms for identity testing, and are the first results for private equivalence testing. The conceptual message of our work is that there exist private hypothesis testers that are nearly as sample-efficient as their non-private counterparts. We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size sublinear in the domain size of the underlying distributions.
SODA Conference 2018 Conference Paper
SODA Conference 2012 Conference Paper
SODA Conference 2012 Conference Paper
STOC Conference 2010 Conference Paper
We consider the problem of maintaining a large matching and a small vertex cover in a dynamically changing graph. Each update to the graph is either an edge deletion or an edge insertion. We give the first randomized data structure that simultaneously achieves a constant approximation factor and handles a sequence of K updates in K*polylog(n) time, where n is the number of vertices in the graph. Previous data structures require a polynomial amount of computation per update.
SODA Conference 2009 Conference Paper
FOCS Conference 2007 Conference Paper
We describe a general method for testing whether a function on n input variables has a concise representation. The approach combines ideas from the junta test of Fischer et al. 16 with ideas from learning theory, and yields property testers that make po! y(s/epsiv) queries (independent of n) for Boolean function classes such as s-term DNF formulas (answering a question posed by Parnas et al. [12]), sizes. decision trees, sizes Boolean formulas, and sizes Boolean circuits. The method can be applied to non-Boolean valued function classes as well. This is achieved via a generalization of the notion of van at ion/row Fischer et al. to non-Boolean functions. Using this generalization we extend the original junta test of Fischer et al. to work for non-Boolean functions, and give poly(s/e)-query testing algorithms for non-Boolean valued function classes such as sizes algebraic circuits and s-sparse polynomials over finite fields. We also prove an Omega(radic(s)) query lower bound for nonadaptively testing s-sparse polynomials over finite fields of constant size. This shows that in some instances, our general method yields a property tester with query complexity that is optimal (for nonadaptive algorithms) up to a polynomial factor.
STOC Conference 2007 Conference Paper
In this work, we consider the problems of testing whether adistribution over (0,1 n ) is k -wise (resp. (ε,k)-wise) independentusing samples drawn from that distribution. For the problem of distinguishing k -wise independent distributions from those that are δ-far from k -wise independence in statistical distance, we upper bound the number ofrequired samples by Õ(n k /δ 2 ) and lower bound it by Ω(n k-1/2 /δ) (these bounds hold for constant k , and essentially the same bounds hold for general k ). Toachieve these bounds, we use Fourier analysis to relate adistribution's distance from k -wise independence to its biases, a measure of the parity imbalance it induces on a setof variables. The relationships we derive are tighter than previouslyknown, and may be of independent interest. To distinguish (ε,k)-wise independent distributions from thosethat are δ-far from (ε,k)-wise independence in statistical distance, we upper bound thenumber of required samples by O(k log n / δ 2 ε 2 ) and lower bound it by Ω(√ k log n / 2 k (ε+δ)√ log 1/2 k (ε+δ)). Although these bounds are anexponential improvement (in terms of n and k ) over thecorresponding bounds for testing k -wise independence, we give evidence thatthe time complexity of testing (ε,k)-wise independence isunlikely to be poly(n,1/ε,1/δ) for k=Θ(log n),since this would disprove a plausible conjecture concerning the hardness offinding hidden cliques in random graphs. Under the conjecture, ourresult implies that for, say, k = log n and ε = 1 / n 0.99 ,there is a set of (ε,k)-wise independent distributions, and a set of distributions at distance δ=1/n 0.51 from (ε,k)-wiseindependence, which are indistinguishable by polynomial time algorithms.
STOC Conference 2005 Conference Paper
STOC Conference 2004 Conference Paper
The complexity of testing properties of monotone and unimodal distributions, when given access only to samples of the distribution, is investigated. Two kinds of sublinear-time algorithms---those for testing monotonicity and those that take advantage of monotonicity---are provided. The first algorithm tests if a given distribution on [n] is monotone or far away from any monotone distribution in L 1 -norm; this algorithm uses O(√n) samples and is shown to be nearly optimal. The next algorithm, given a joint distribution on [n] x [n], tests if it is monotone or is far away from any monotone distribution in L 1 -norm; this algorithm uses O(n 3/2 ) samples. The problems of testing if two monotone distributions are close in L 1 -norm and if two random variables with a monotone joint distribution are close to being independent in L 1 -norm are also considered. Algorithms for these problems that use only poly(log n) samples are presented. The closeness and independence testing algorithms for monotone distributions are significantly more efficient than the corresponding algorithms as well as the lower bounds for arbitrary distributions. Some of the above results are also extended to unimodal distributions.
SODA Conference 2004 Conference Paper
STOC Conference 2003 Conference Paper
SODA Conference 2003 Conference Paper
STOC Conference 2002 Conference Paper
The field of property testing studies algorithms that distinguish, using a small number of queries, between inputs which satisfy a given property, and those that are 'far' from satisfying the property. Testing properties that are defined in terms of monotonicity has been extensively investigated, primarily in the context of the monotonicity of a sequence of integers, or the monotonicity of a function over the n -dimensional hypercube {1,…, m } n . These works resulted in monotonicity testers whose query complexity is at most polylogarithmic in the size of the domain.We show that in its most general setting, testing that Boolean functions are close to monotone is equivalent, with respect to the number of required queries, to several other testing problems in logic and graph theory. These problems include: testing that a Boolean assignment of variables is close to an assignment that satisfies a specific 2 -CNF formula, testing that a set of vertices is close to one that is a vertex cover of a specific graph, and testing that a set of vertices is close to a clique.We then investigate the query complexity of monotonicity testing of both Boolean and integer functions over general partial orders. We give algorithms and lower bounds for the general problem, as well as for some interesting special cases. In proving a general lower bound, we construct graphs with combinatorial properties that may be of independent interest.
STOC Conference 2002 Conference Paper
(MATH) We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i -th location is the probability of the i -th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a λ-multiplicative approximation to the entropy can be obtained in O ( n (1+η) /λ 2 < poly(log n ) ) time for distributions with entropy Ω(λ η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω( n max(1/(2λ 2 ), 2/(5λ 2 —2)) .We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a λ-multiplicative approximation to the entropy can be obtained in O ( λ2 .Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.
FOCS Conference 2001 Conference Paper
Given access to independent samples of a distribution A over [n] /spl times/ [m], we show how to test whether the distributions formed by projecting A to each coordinate are independent, i. e. , whether A is /spl epsi/-close in the L/sub 1/ norm to the product distribution A/sub 1//spl times/A/sub 2/ for some distributions A/sub 1/ over [n] and A/sub 2/ over [m]. The sample complexity of our test is O/spl tilde/(n/sup 2/3/m/sup 1/3/poly(/spl epsi//sup -1/)), assuming without loss of generality that m/spl les/n. We also give a matching lower bound, up to poly (log n, /spl epsi//sup -1/) factors. Furthermore, given access to samples of a distribution X over [n], we show how to test if X is /spl epsi/-close in L/sub 1/ norm to an explicitly specified distribution Y. Our test uses O/spl tilde/(n/sup 1/2/poly(/spl epsi//sup -1/)) samples, which nearly matches the known tight bounds for the case when Y is uniform.
STOC Conference 2000 Conference Paper
FOCS Conference 2000 Conference Paper
Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than max(/spl epsiv//sup 2//32/sup 3//spl radic/n, /spl epsiv//4/spl radic/n=)) or large (more than /spl epsiv/) in L/sub 1/-distance. We also give an /spl Omega/(n/sup 2/3//spl epsiv//sup -2/3/) lower bound. Our algorithm has applications to the problem of checking whether a given Markov process is rapidly mixing. We develop sublinear algorithms for this problem as well.
STOC Conference 1999 Conference Paper
STOC Conference 1998 Conference Paper
FOCS Conference 1996 Conference Paper
The authors show how to check programs that compute polynomials and functions defined by addition theorems-in the realistic setting where the output of the program is approximate instead of exact. They present results showing how to perform approximate checking, self-testing, and self-correcting of polynomials, settling in the affirmative a question raised by Gemmell et al. (1991), and Rubinfeld and Sudan (1992, 1996). They then show how to perform approximate checking, self-testing, and self-correcting for those functions that satisfy addition theorems, settling a question raised by Rubinfeld (1994]) In both cases, they show that the properties used to test programs for these functions are both robust (in the approximate sense) and stable. Finally, they explore the use of reductions between functional equations in the context of approximate self-testing. Their results have implications to the stability theory of functional equations.
FOCS Conference 1996 Conference Paper
Graph expansion has proved to be a powerful general tool for analyzing the behavior of routing algorithms and the inter-connection networks on which they run. We develop new routing algorithms and structural results for bounded-degree expander graphs. Our results are unified by the fact that they are all based upon, and extend, a body of work: asserting that expanders are rich in short, disjoint paths. In particular, our work has consequences for the disjoint paths problem, multicommodity flow, and graph minor containment. We show: (i) A greedy algorithm for approximating the maximum disjoint paths problem achieves a polylogarithmic approximation ratio in bounded-degree expanders. Although our algorithm is both deterministic and on-line, its performance guarantee is an improvement over previous bounds in expanders. (ii) For a multicommodity flow problem with arbitrary demands on a bounded-degree expander there is a (1+/spl epsi/)-optimal solution using only flow paths of polylogarithmic length. It follows that the multicommodity flow algorithm of Awerbuch and Leighton runs in nearly linear time per commodity in expanders. Our analysis is based on establishing the following: given edge weights on an expander G, one can increase some of the weights very slightly so the resulting shortest-path metric is smooth-the min-weight path between any pair of nodes uses a polylogarithmic number of edges. (iii) Every bounded-degree expander on n nodes contains every graph with O(n/log/sup 0(1)/n) nodes and edges as a minor.
FOCS Conference 1995 Conference Paper
We examine the problem of learning to play various games optimally against resource-bounded adversaries, with an explicit emphasis on the computational efficiency of the learning algorithm. We are especially interested in providing efficient algorithms for games other than penny-matching (in which payoff is received for matching the adversary's action in the current round), and for adversaries other than the classically studied finite automata. In particular, we examine games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithm. For example, in the game we call contract, both sides play O or 1 on each round, but our side receives payoff only if we play 1 in synchrony with the adversary; unlike penny-matching, playing O in synchrony with the adversary pays nothing. The name of the game is derived from the example of signing a contract, which becomes valid only if both parties sign (play 1).
FOCS Conference 1995 Conference Paper
Given a function f mappping n-variate inputs from a finite field F into F, we consider the task of reconstructing a list of all n-variate degree d polynomials which agree with f on a tiny but non-negligible fraction, /spl delta/, of the input space. We give a randomized algorithm for solving this task which accesses f as a black box and runs in time polynomial in 1//spl delta/, n and exponential in d, provided /spl delta/ is /spl Omega/(/spl radic/(d.
STOC Conference 1994 Conference Paper
FOCS Conference 1994 Conference Paper
Given a functional equation, such as /spl forall/x, y f(x)+f(y)=f(x+y), we study the following general question: When can the "for all" quantifiers be replaced by "for most" quantifiers without essentially changing the functions that are characterized by the property? When "for most" quantifiers are sufficient, we say that the functional equation is robust. We show conditions on functional equations of the form /spl forall/x, y F[f(x-y), f(x+y), f(x), f(y)]=0, where F is an algebraic function, that imply robustness. We then initiate a general study aimed at characterizing properties of functional equations that determine whether or not they are robust. Our results have applications to the area of self-testing/correcting programs-this paper provides results which show that the concept of self-testing/correcting has much broader applications than we previously understood. We show that self-testers and self-correctors can be found for many functions satisfying robust functional equations, including tan x, 1/1+cot x, Ax/1-Ax', cosh x. >
STOC Conference 1993 Conference Paper
FOCS Conference 1992 Conference Paper
The authors consider the task of reconstructing algebraic functions given by black boxes. Unlike traditional settings, they are interested in black boxes which represent several algebraic functions-f/sub 1/, .. ., f/sub k/, where at each input x, the box arbitarrily chooses a subset of f/sub 1/(x), .. ., f/sub k/(x) to output. They show how to reconstruct the functions f/sub 1/, .. ., f/sub k/ from the black box. This allows them to group the same points into sets, such that for each set, all outputs to points in the set are from the same algebraic function. The methods are robust in the presence of errors in the black box. The model and techniques can be applied in the areas of computer vision, machine learning, curve fitting and polynomial approximation, self-correcting programs and bivariate polynomial factorization. >
SODA Conference 1992 Conference Paper
STOC Conference 1991 Conference Paper
STOC Conference 1990 Conference Paper
ICRA Conference 1986 Conference Paper
Grip determination is essential to any task level planning process and to situations in which the pose of the part to be grasped is not known a priori. This paper presents the complete force/moment equations for grasping via a rigid two-fingered gripper. Surface contact is modeled as a linear pressure variation. The quality measure of a grip is taken to be the coefficient of friction needed to keep it from slipping under applied forces and moments, the lower the coefficient of friction, the better the grip. The incorporation of this evaluation into a general grip selection strategy is discussed and several examples given.