Arrow Research search

Author name cluster

Or Zamir

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

TMLR Journal 2024 Journal Article

Undetectable Steganography for Language Models

  • Or Zamir

We introduce a cryptographic method to hide an arbitrary secret payload in the response of a Large Language Model (LLM). A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.

FOCS Conference 2022 Conference Paper

Planting Undetectable Backdoors in Machine Learning Models: [Extended Abstract]

  • Shafi Goldwasser
  • Michael P. Kim
  • Vinod Vaikuntanathan
  • Or Zamir

Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns of trust. This work studies possible abuses of power by untrusted learners. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate “backdoor key, ” the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. •First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given query access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Moreover, even if the distinguisher can request backdoored inputs of its choice, they cannot backdoor a new input—a property we call non-replicability. •Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm (Rahimi, Recht; NeurIPS 2007). In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is “clean” or contains a backdoor. The backdooring algorithm executes the RFF algorithm faithfully on the given training data, tampering only with its random coins. We prove this strong guarantee under the hardness of the Continuous Learning With Errors problem (Bruna, Regev, Song, Tang; STOC 2021). We show a similar white-box undetectable backdoor for random ReLU networks based on the hardness of Sparse PCA (Berthet, Rigollet; COLT 2013). Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, by constructing undetectable backdoor for an “adversarially-robust” learning algorithm, we can produce a classifier that is indistinguishable from a robust classifier, but where every input has an adversarial example! In this way, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.

SODA Conference 2022 Conference Paper

Simulating a stack using queues

  • Haim Kaplan
  • Robert Endre Tarjan
  • Or Zamir
  • Uri Zwick

It is well known that a queue can be simulated by two stacks using a constant number of stack operations per queue operation. In this paper we consider the forgotten converse problem of simulating a stack using several queues. We consider several variants of this problem. For the offline variant, we obtain a tight upper and lower bounds for the worst-case number of queue operations needed to simulate a sequence of n stack operations using k queues. For the online variant, when the number of queues k is constant, and n is the maximum number of items in the stack at any given time, we obtain tight Θ( n 1/ k ) upper and lower bounds on the worst-case and amortized number of queue operations needed to simulate one stack operation. When k is allowed to grow with n, we prove an upper bound of O ( n 1/ k + log k n ) and a lower bound of on the amortized number of queue operations per stack operation. We also prove an upper bound of O ( kn 1/ k ) and a lower bound of Ω( n 1/ k + log k n ) on the worst-case number of queue operations per stack operation. We also show that the specific but interesting sequence of n pushes followed by n pops can be implemented much faster using a total number of only Θ( n log k n ) queue operations, for every k ≥ 2, an amortized number of Θ(log k n ) queue operations per stack operation, and this bound is tight. On the other hand, we show that the same sequence requires at least Ω( n 1/ k ) queue operations per stack operation in the worst case.

ICML Conference 2021 Conference Paper

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

  • Shyam Narayanan
  • Sandeep Silwal
  • Piotr Indyk
  • Or Zamir

Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset $X$ onto a random $d = O(d_X)$-dimensional subspace (where $d_X$ is the doubling dimension of $X$), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension $d$ having an extra $\log \log n$ term and the approximation factor being arbitrarily close to $1$. Furthermore, we extend these results to approximating {\em solutions} instead of just their {\em costs}. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of $k$-means and $k$-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of $X$.

FOCS Conference 2021 Conference Paper

Tight Space Complexity of the Coin Problem

  • Mark Braverman
  • Sumegha Garg
  • Or Zamir

In the coin problem we are asked to distinguish, with probability at least 2/3, between $n\ i. i. d$. coins which are heads with probability $\frac{1}{2}+\beta$ from ones which are heads with probability $\frac{1}{2}-\beta$. We are interested in the space complexity of the coin problem, corresponding to the width of a read-once branching program solving the problem. The coin problem becomes more difficult as $\beta$ becomes smaller. Statistically, it can be solved whenever $\beta= \Omega(n^{-1/2})$, using counting. It has been previously shown that for $\beta=O(n^{-1/2})$, counting is essentially optimal (equivalently, width $poly (n)$ is necessary [Braverman-Garg-Woodruff FOCS'20]). On the other hand, the coin problem only requires $O(\log n)$ width for $\beta > n^{-c}$ for any constant $c > \log_{2}(\sqrt{5}-1)\approx 0. 306$ (following low-width simulation of AND-OR tree of [Valiant Journal of Algorithms'84]). In this paper, we close the gap between the bounds, showing a tight threshold between the values of $\beta=n^{-c}$ where $O(\log n)$ width suffices and the regime where $poly (n)$ width is needed, with a transition at $c=1/3$. This gives a complete characterization (up to constant factors) of the memory complexity of solving the coin problem, for all values of bias $\beta$. We introduce new techniques in both bounds. For the upper bound, we give a construction based on recursive majority that does not require a memory stack of size $\log n$ bits. For the lower bound, we introduce new combinatorial techniques for analyzing progression of the success probabilities in read-once branching programs.

SODA Conference 2019 Conference Paper

A sort of an adversary

  • Haim Kaplan
  • Or Zamir
  • Uri Zwick

We describe an efficient deterministic adversary that forces any comparison-based sorting algorithm to perform at least n log n comparisons. This improves on previous efficient adversaries of Atallah and Kosaraju (1981), Richards and Vaidya (1988), and of Brodal et al. (1996) that force any sorting algorithm to perform at least ψ n log n comparisons.

STOC Conference 2019 Conference Paper

Faster k -SAT algorithms using biased-PPSZ

  • Thomas Dueholm Hansen
  • Haim Kaplan
  • Or Zamir
  • Uri Zwick

The PPSZ algorithm, due to Paturi, Pudlak, Saks and Zane, is currently the fastest known algorithm for the k -SAT problem, for every k >3. For 3-SAT, a tiny improvement over PPSZ was obtained by Hertli. We introduce a biased version of the PPSZ algorithm using which we obtain an improvement over PPSZ for every k ≥ 3. For k =3 we also improve on Herli’s result and get a much more noticeable improvement over PPSZ, though still relatively small. In particular, for Unique 3-SAT, we improve the current bound from 1.308 n to 1.307 n .

FOCS Conference 2019 Conference Paper

Random k-out Subgraph Leaves only O(n/k) Inter-Component Edges

  • Jacob Holm
  • Valerie King
  • Mikkel Thorup
  • Or Zamir
  • Uri Zwick

Each vertex of an arbitrary simple graph on n vertices chooses k random incident edges. What is the expected number of edges in the original graph that connect different connected components of the sampled subgraph? We prove that the answer is O(n/k), when k ≥ c log n, for some large enough c. We conjecture that the same holds for smaller values of k, possibly for any k ≥ 2. Such a result is best possible for any k ≥ 2. As an application, we use this sampling result to obtain a one-way communication protocol with private randomness for finding a spanning forest of a graph in which each vertex sends only O (√n log n) bits to a referee.

SODA Conference 2015 Conference Paper

The amortized cost of finding the minimum

  • Haim Kaplan
  • Or Zamir
  • Uri Zwick

We obtain an essentially optimal tradeoff between the amortized cost of the three basic priority queue operations insert, delete and find-min in the comparison model. More specifically, we show that for any fixed ε > 0, where n is the number of items in the priority queue and A (insert), A (delete) and A (find-min) are the amortized costs of the insert, delete and find-min operations, respectively. In particular, if A (insert) + A (delete) = O (1), then A (find-min) = Ω( n ), and A (find-min) = O ( n α ), for some α < 1, only if A (insert) + A (delete) = Ω(log n ). (We can, of course, have A (insert) = O (1), A (delete) = O (log n ), or vice versa, and A (find-min) = O (1).) Our lower bound holds even if randomization is allowed. Surprisingly, such fundamental bounds on the amortized cost of the operations were not known before. Brodal, Chaudhuri and Rad-hakrishnan, obtained similar bounds for the worst-case complexity of find-min.