Arrow Research search

Author name cluster

Ronitt Rubinfeld

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

NeurIPS Conference 2024 Conference Paper

Optimal Algorithms for Augmented Testing of Discrete Distributions

  • Maryam Aliakbarpour
  • Piotr Indyk
  • Ronitt Rubinfeld
  • Sandeep Silwal

We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution $p$, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor’s quality, measured by its total variation distance from $p$. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimation’s accuracy (i. e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i. e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach.

NeurIPS Conference 2022 Conference Paper

Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks

  • Anders Aamand
  • Justin Chen
  • Piotr Indyk
  • Shyam Narayanan
  • Ronitt Rubinfeld
  • Nicholas Schiefer
  • Sandeep Silwal
  • Tal Wagner

Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$. We present an improved simulation of the WL test on GNNs with {\em exponentially} lower complexity. In particular, the neural network implementing the combine function in each node has only $\mathrm{polylog}(n)$ parameters, and the feature vectors exchanged by the nodes of GNN consists of only $O(\log n)$ bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.

FOCS Conference 2022 Conference Paper

Properly learning monotone functions via local correction

  • Jane Lange
  • Ronitt Rubinfeld
  • Arsen Vasilyan

We give a $2^{\tilde{O}(\sqrt{n}/\varepsilon)}$-time algorithm for properly learning monotone Boolean functions under the uniform distribution over $\{0, 1\}^{n}$. Our algorithm is robust to adversarial label noise and has a running time nearly matching that of the state-of-the-art improper learning algorithm of Bshouty and Tamon (JACM 96) and an information-theoretic lower bound of Blais et al (RANDOM ’15). Prior to this work, no proper learning algorithm with running time smaller than $2^{\Omega(n)}$ was known to exist. The core of our proper learner is a local computation algorithm for sorting binary labels on a poset. Our algorithm is built on a body of work on distributed greedy graph algorithms; specifically we rely on a recent work of Ghaffari (FOCS’22), which gives an efficient algorithm for computing maximal matchings in a graph in the LCA model of Rubinfeld et al and Alon et al (ICS’II, SODA’12). The applications of our local sorting algorithm extend beyond learning on the Boolean cube: we also give a tolerant tester for Boolean functions over general posets that distinguishes functions that are $\varepsilon$/3-close to monotone from those that are $\varepsilon-$far. Previous tolerant testers for the Boolean cube only distinguished between $\varepsilon/\Omega(\sqrt{n}$)-close and $\varepsilon-$far.

ICLR Conference 2022 Conference Paper

Triangle and Four Cycle Counting with Predictions in Graph Streams

  • Justin Y. Chen
  • Talya Eden
  • Piotr Indyk
  • Honghao Lin
  • Shyam Narayanan
  • Ronitt Rubinfeld
  • Sandeep Silwal
  • Tal Wagner

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, Hsu et al. (2019) and Jiang et al. (2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior “classical” algorithms that did not use oracles. In this paper, we explore the power of a “heavy edge” oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on “classical” streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

ICLR Conference 2021 Conference Paper

Learning-based Support Estimation in Sublinear Time

  • Talya Eden
  • Piotr Indyk
  • Shyam Narayanan
  • Ronitt Rubinfeld
  • Sandeep Silwal
  • Tal Wagner

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $ \pm \varepsilon n$ from a sample of size $O(\log^2(1/\varepsilon) \cdot n/\log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to $$ \ \log (1/\varepsilon) \cdot n^{1-\Theta(1/\log(1/\varepsilon))}. $$ We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR'19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.

NeurIPS Conference 2019 Conference Paper

Private Testing of Distributions via Sample Permutations

  • Maryam Aliakbarpour
  • Ilias Diakonikolas
  • Daniel Kane
  • Ronitt Rubinfeld

Statistical tests are at the heart of many scientific tasks. To validate their hypothesis, researchers in medical and social sciences use individuals' data. The sensitivity of participants' data requires the design of statistical tests that ensure the privacy of the individuals in the most efficient way. In this paper, we use the framework of property testing to design algorithms to test the properties of the distribution that the data is drawn from with respect to differential privacy. In particular, we investigate testing two fundamental properties of distributions: (1) testing the equivalence of two distributions when we have unequal numbers of samples from the two distributions. (2) Testing independence of two random variables. In both cases, we show that our testers achieve near optimal sample complexity (up to logarithmic factors). Moreover, our dependence on the privacy parameter is an additive term, which indicates that differential privacy can be obtained in most regimes of parameters for free.

ICML Conference 2018 Conference Paper

Differentially Private Identity and Equivalence Testing of Discrete Distributions

  • Maryam Aliakbarpour
  • Ilias Diakonikolas
  • Ronitt Rubinfeld

We study the fundamental problems of identity and equivalence testing over a discrete population from random samples. Our goal is to develop efficient testers while guaranteeing differential privacy to the individuals of the population. We provide sample-efficient differentially private testers for these problems. Our theoretical results significantly improve over the best known algorithms for identity testing, and are the first results for private equivalence testing. The conceptual message of our work is that there exist private hypothesis testers that are nearly as sample-efficient as their non-private counterparts. We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size sublinear in the domain size of the underlying distributions.

STOC Conference 2010 Conference Paper

Maintaining a large matching and a small vertex cover

  • Krzysztof Onak
  • Ronitt Rubinfeld

We consider the problem of maintaining a large matching and a small vertex cover in a dynamically changing graph. Each update to the graph is either an edge deletion or an edge insertion. We give the first randomized data structure that simultaneously achieves a constant approximation factor and handles a sequence of K updates in K*polylog(n) time, where n is the number of vertices in the graph. Previous data structures require a polynomial amount of computation per update.

FOCS Conference 2007 Conference Paper

Testing for Concise Representations

  • Ilias Diakonikolas
  • Homin K. Lee
  • Kevin Matulef
  • Krzysztof Onak
  • Ronitt Rubinfeld
  • Rocco A. Servedio
  • Andrew Wan

We describe a general method for testing whether a function on n input variables has a concise representation. The approach combines ideas from the junta test of Fischer et al. 16 with ideas from learning theory, and yields property testers that make po! y(s/epsiv) queries (independent of n) for Boolean function classes such as s-term DNF formulas (answering a question posed by Parnas et al. [12]), sizes. decision trees, sizes Boolean formulas, and sizes Boolean circuits. The method can be applied to non-Boolean valued function classes as well. This is achieved via a generalization of the notion of van at ion/row Fischer et al. to non-Boolean functions. Using this generalization we extend the original junta test of Fischer et al. to work for non-Boolean functions, and give poly(s/e)-query testing algorithms for non-Boolean valued function classes such as sizes algebraic circuits and s-sparse polynomials over finite fields. We also prove an Omega(radic(s)) query lower bound for nonadaptively testing s-sparse polynomials over finite fields of constant size. This shows that in some instances, our general method yields a property tester with query complexity that is optimal (for nonadaptive algorithms) up to a polynomial factor.

STOC Conference 2007 Conference Paper

Testing k-wise and almost k-wise independence

  • Noga Alon
  • Alexandr Andoni
  • Tali Kaufman
  • Kevin Matulef
  • Ronitt Rubinfeld
  • Ning Xie 0002

In this work, we consider the problems of testing whether adistribution over (0,1 n ) is k -wise (resp. (ε,k)-wise) independentusing samples drawn from that distribution. For the problem of distinguishing k -wise independent distributions from those that are δ-far from k -wise independence in statistical distance, we upper bound the number ofrequired samples by Õ(n k /δ 2 ) and lower bound it by Ω(n k-1/2 /δ) (these bounds hold for constant k , and essentially the same bounds hold for general k ). Toachieve these bounds, we use Fourier analysis to relate adistribution's distance from k -wise independence to its biases, a measure of the parity imbalance it induces on a setof variables. The relationships we derive are tighter than previouslyknown, and may be of independent interest. To distinguish (ε,k)-wise independent distributions from thosethat are δ-far from (ε,k)-wise independence in statistical distance, we upper bound thenumber of required samples by O(k log n / δ 2 ε 2 ) and lower bound it by Ω(√ k log n / 2 k (ε+δ)√ log 1/2 k (ε+δ)). Although these bounds are anexponential improvement (in terms of n and k ) over thecorresponding bounds for testing k -wise independence, we give evidence thatthe time complexity of testing (ε,k)-wise independence isunlikely to be poly(n,1/ε,1/δ) for k=Θ(log n),since this would disprove a plausible conjecture concerning the hardness offinding hidden cliques in random graphs. Under the conjecture, ourresult implies that for, say, k = log n and ε = 1 / n 0.99 ,there is a set of (ε,k)-wise independent distributions, and a set of distributions at distance δ=1/n 0.51 from (ε,k)-wiseindependence, which are indistinguishable by polynomial time algorithms.

STOC Conference 2004 Conference Paper

Sublinear algorithms for testing monotone and unimodal distributions

  • Tugkan Batu
  • Ravi Kumar 0001
  • Ronitt Rubinfeld

The complexity of testing properties of monotone and unimodal distributions, when given access only to samples of the distribution, is investigated. Two kinds of sublinear-time algorithms---those for testing monotonicity and those that take advantage of monotonicity---are provided. The first algorithm tests if a given distribution on [n] is monotone or far away from any monotone distribution in L 1 -norm; this algorithm uses O(√n) samples and is shown to be nearly optimal. The next algorithm, given a joint distribution on [n] x [n], tests if it is monotone or is far away from any monotone distribution in L 1 -norm; this algorithm uses O(n 3/2 ) samples. The problems of testing if two monotone distributions are close in L 1 -norm and if two random variables with a monotone joint distribution are close to being independent in L 1 -norm are also considered. Algorithms for these problems that use only poly(log n) samples are presented. The closeness and independence testing algorithms for monotone distributions are significantly more efficient than the corresponding algorithms as well as the lower bounds for arbitrary distributions. Some of the above results are also extended to unimodal distributions.

STOC Conference 2002 Conference Paper

Monotonicity testing over general poset domains

  • Eldar Fischer
  • Eric Lehman
  • Ilan Newman
  • Sofya Raskhodnikova
  • Ronitt Rubinfeld
  • Alex Samorodnitsky

The field of property testing studies algorithms that distinguish, using a small number of queries, between inputs which satisfy a given property, and those that are 'far' from satisfying the property. Testing properties that are defined in terms of monotonicity has been extensively investigated, primarily in the context of the monotonicity of a sequence of integers, or the monotonicity of a function over the n -dimensional hypercube {1,…, m } n . These works resulted in monotonicity testers whose query complexity is at most polylogarithmic in the size of the domain.We show that in its most general setting, testing that Boolean functions are close to monotone is equivalent, with respect to the number of required queries, to several other testing problems in logic and graph theory. These problems include: testing that a Boolean assignment of variables is close to an assignment that satisfies a specific 2 -CNF formula, testing that a set of vertices is close to one that is a vertex cover of a specific graph, and testing that a set of vertices is close to a clique.We then investigate the query complexity of monotonicity testing of both Boolean and integer functions over general partial orders. We give algorithms and lower bounds for the general problem, as well as for some interesting special cases. In proving a general lower bound, we construct graphs with combinatorial properties that may be of independent interest.

STOC Conference 2002 Conference Paper

The complexity of approximating entropy

  • Tugkan Batu
  • Sanjoy Dasgupta
  • Ravi Kumar 0001
  • Ronitt Rubinfeld

(MATH) We consider the problem of approximating the entropy of a discrete distribution under several models. If the distribution is given explicitly as an array where the i -th location is the probability of the i -th element, then linear time is both necessary and sufficient for approximating the entropy.We consider a model in which the algorithm is given access only to independent samples from the distribution. Here, we show that a λ-multiplicative approximation to the entropy can be obtained in O ( n (1+η) /λ 2 < poly(log n ) ) time for distributions with entropy Ω(λ η), where n is the size of the domain of the distribution and η is an arbitrarily small positive constant. We show that one cannot get a multiplicative approximation to the entropy in general in this model. Even for the class of distributions to which our upper bound applies, we obtain a lower bound of Ω( n max(1/(2λ 2 ), 2/(5λ 2 —2)) .We next consider a hybrid model in which both the explicit distribution as well as independent samples are available. Here, significantly more efficient algorithms can be achieved: a λ-multiplicative approximation to the entropy can be obtained in O ( λ2 .Finally, we consider two special families of distributions: those for which the probability of an element decreases monotonically in the label of the element, and those that are uniform over a subset of the domain. In each case, we give more efficient algorithms for approximating the entropy.

FOCS Conference 2001 Conference Paper

Testing Random Variables for Independence and Identity

  • Tugkan Batu
  • Lance Fortnow
  • Eldar Fischer
  • Ravi Kumar 0001
  • Ronitt Rubinfeld
  • Patrick White

Given access to independent samples of a distribution A over [n] /spl times/ [m], we show how to test whether the distributions formed by projecting A to each coordinate are independent, i. e. , whether A is /spl epsi/-close in the L/sub 1/ norm to the product distribution A/sub 1//spl times/A/sub 2/ for some distributions A/sub 1/ over [n] and A/sub 2/ over [m]. The sample complexity of our test is O/spl tilde/(n/sup 2/3/m/sup 1/3/poly(/spl epsi//sup -1/)), assuming without loss of generality that m/spl les/n. We also give a matching lower bound, up to poly (log n, /spl epsi//sup -1/) factors. Furthermore, given access to samples of a distribution X over [n], we show how to test if X is /spl epsi/-close in L/sub 1/ norm to an explicitly specified distribution Y. Our test uses O/spl tilde/(n/sup 1/2/poly(/spl epsi//sup -1/)) samples, which nearly matches the known tight bounds for the case when Y is uniform.

FOCS Conference 2000 Conference Paper

Testing that distributions are close

  • Tugkan Batu
  • Lance Fortnow
  • Ronitt Rubinfeld
  • Warren D. Smith
  • Patrick White

Given two distributions over an n element set, we wish to check whether these distributions are statistically close by only sampling. We give a sublinear algorithm which uses O(n/sup 2/3//spl epsiv//sup -4/ log n) independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than max(/spl epsiv//sup 2//32/sup 3//spl radic/n, /spl epsiv//4/spl radic/n=)) or large (more than /spl epsiv/) in L/sub 1/-distance. We also give an /spl Omega/(n/sup 2/3//spl epsiv//sup -2/3/) lower bound. Our algorithm has applications to the problem of checking whether a given Markov process is rapidly mixing. We develop sublinear algorithms for this problem as well.

STOC Conference 1998 Conference Paper

Spot-Checkers

  • Funda Ergün
  • Sampath Kannan
  • Ravi Kumar 0001
  • Ronitt Rubinfeld
  • Mahesh Viswanathan 0001

FOCS Conference 1996 Conference Paper

Approximate Checking of Polynomials and Functional Equations (extended abstract)

  • Funda Ergün
  • Ravi Kumar 0001
  • Ronitt Rubinfeld

The authors show how to check programs that compute polynomials and functions defined by addition theorems-in the realistic setting where the output of the program is approximate instead of exact. They present results showing how to perform approximate checking, self-testing, and self-correcting of polynomials, settling in the affirmative a question raised by Gemmell et al. (1991), and Rubinfeld and Sudan (1992, 1996). They then show how to perform approximate checking, self-testing, and self-correcting for those functions that satisfy addition theorems, settling a question raised by Rubinfeld (1994]) In both cases, they show that the properties used to test programs for these functions are both robust (in the approximate sense) and stable. Finally, they explore the use of reductions between functional equations in the context of approximate self-testing. Their results have implications to the stability theory of functional equations.

FOCS Conference 1996 Conference Paper

Short Paths in Expander Graphs

  • Jon M. Kleinberg
  • Ronitt Rubinfeld

Graph expansion has proved to be a powerful general tool for analyzing the behavior of routing algorithms and the inter-connection networks on which they run. We develop new routing algorithms and structural results for bounded-degree expander graphs. Our results are unified by the fact that they are all based upon, and extend, a body of work: asserting that expanders are rich in short, disjoint paths. In particular, our work has consequences for the disjoint paths problem, multicommodity flow, and graph minor containment. We show: (i) A greedy algorithm for approximating the maximum disjoint paths problem achieves a polylogarithmic approximation ratio in bounded-degree expanders. Although our algorithm is both deterministic and on-line, its performance guarantee is an improvement over previous bounds in expanders. (ii) For a multicommodity flow problem with arbitrary demands on a bounded-degree expander there is a (1+/spl epsi/)-optimal solution using only flow paths of polylogarithmic length. It follows that the multicommodity flow algorithm of Awerbuch and Leighton runs in nearly linear time per commodity in expanders. Our analysis is based on establishing the following: given edge weights on an expander G, one can increase some of the weights very slightly so the resulting shortest-path metric is smooth-the min-weight path between any pair of nodes uses a polylogarithmic number of edges. (iii) Every bounded-degree expander on n nodes contains every graph with O(n/log/sup 0(1)/n) nodes and edges as a minor.

FOCS Conference 1995 Conference Paper

Efficient Algorithms for Learning to Play Repeated Games Against Computationally Bounded Adversaries

  • Yoav Freund
  • Michael J. Kearns
  • Yishay Mansour
  • Dana Ron
  • Ronitt Rubinfeld
  • Robert E. Schapire

We examine the problem of learning to play various games optimally against resource-bounded adversaries, with an explicit emphasis on the computational efficiency of the learning algorithm. We are especially interested in providing efficient algorithms for games other than penny-matching (in which payoff is received for matching the adversary's action in the current round), and for adversaries other than the classically studied finite automata. In particular, we examine games and adversaries for which the learning algorithm's past actions may strongly affect the adversary's future willingness to "cooperate" (that is, permit high payoff), and therefore require carefully planned actions on the part of the learning algorithm. For example, in the game we call contract, both sides play O or 1 on each round, but our side receives payoff only if we play 1 in synchrony with the adversary; unlike penny-matching, playing O in synchrony with the adversary pays nothing. The name of the game is derived from the example of signing a contract, which becomes valid only if both parties sign (play 1).

FOCS Conference 1995 Conference Paper

Learning Polynomials with Queries: The Highly Noisy Case

  • Oded Goldreich 0001
  • Ronitt Rubinfeld
  • Madhu Sudan 0001

Given a function f mappping n-variate inputs from a finite field F into F, we consider the task of reconstructing a list of all n-variate degree d polynomials which agree with f on a tiny but non-negligible fraction, /spl delta/, of the input space. We give a randomized algorithm for solving this task which accesses f as a black box and runs in time polynomial in 1//spl delta/, n and exponential in d, provided /spl delta/ is /spl Omega/(/spl radic/(d.

FOCS Conference 1994 Conference Paper

On the robustness of functional equations

  • Ronitt Rubinfeld

Given a functional equation, such as /spl forall/x, y f(x)+f(y)=f(x+y), we study the following general question: When can the "for all" quantifiers be replaced by "for most" quantifiers without essentially changing the functions that are characterized by the property? When "for most" quantifiers are sufficient, we say that the functional equation is robust. We show conditions on functional equations of the form /spl forall/x, y F[f(x-y), f(x+y), f(x), f(y)]=0, where F is an algebraic function, that imply robustness. We then initiate a general study aimed at characterizing properties of functional equations that determine whether or not they are robust. Our results have applications to the area of self-testing/correcting programs-this paper provides results which show that the concept of self-testing/correcting has much broader applications than we previously understood. We show that self-testers and self-correctors can be found for many functions satisfying robust functional equations, including tan x, 1/1+cot x, Ax/1-Ax', cosh x. >

FOCS Conference 1992 Conference Paper

Reconstructing Algebraic Functions from Mixed Data

  • Sigal Ar
  • Richard J. Lipton
  • Ronitt Rubinfeld
  • Madhu Sudan 0001

The authors consider the task of reconstructing algebraic functions given by black boxes. Unlike traditional settings, they are interested in black boxes which represent several algebraic functions-f/sub 1/, .. ., f/sub k/, where at each input x, the box arbitarrily chooses a subset of f/sub 1/(x), .. ., f/sub k/(x) to output. They show how to reconstruct the functions f/sub 1/, .. ., f/sub k/ from the black box. This allows them to group the same points into sets, such that for each set, all outputs to points in the set are from the same algebraic function. The methods are robust in the presence of errors in the black box. The model and techniques can be applied in the areas of computer vision, machine learning, curve fitting and polynomial approximation, self-correcting programs and bivariate polynomial factorization. >

ICRA Conference 1986 Conference Paper

Automatic two-fingered grip selection

  • James Barber
  • Richard A. Volz
  • Rajiv S. Desai
  • Ronitt Rubinfeld
  • Brian Schipper
  • Jan Wolter 0002

Grip determination is essential to any task level planning process and to situations in which the pose of the part to be grasped is not known a priori. This paper presents the complete force/moment equations for grasping via a rigid two-fingered gripper. Surface contact is modeled as a linear pressure variation. The quality measure of a grip is taken to be the coefficient of friction needed to keep it from slipping under applied forces and moments, the lower the coefficient of friction, the better the grip. The incorporation of this evaluation into a general grip selection strategy is discussed and several examples given.