Arrow Research search

Author name cluster

Anindya De

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers
2 author rows

Possible papers

33

FOCS Conference 2025 Conference Paper

Stochastic Knapsack without Relaxing the Capacity

  • Anindya De
  • Sanjeev Khanna
  • Nathan White

We present the first polynomial-time approximation scheme (PTAS) for the stochastic knapsack problem that does not relax the knapsack’s capacity. Given n items with known arbitrary independent size distributions and fixed profits, an accuracy parameter $\varepsilon \in(0, 1)$, and an overflow probability bound $\alpha$, our algorithm computes a set of items with profit at least $(1-\varepsilon)$ times optimal, while ensuring the probability of exceeding the capacity is at most $4 \sqrt{\alpha}+\varepsilon$. Prior to our work, no PTAS was known without either allowing a ($1+\varepsilon$) capacity expansion or restricting to special distribution classes (such as Poisson or Gaussian). A key tool in our algorithm is an anti-concentration result that allows us to handle “low-profit” items by adapting a known PTAS result for the case when we are allowed to expand knapsack capacity by a ($1+\varepsilon$) factor. We then show that we are able to convert this solution into another solution with a similar profit which strictly obeys the knapsack capacity, but requires that we relax the overflow probability to a $4 \sqrt{\alpha}+\varepsilon$ factor. In the special case where the item sizes are scaled Bernoulli random variables (which have support on 0 and exactly one other value), we extend our approach to obtain an improved overflow probability guarantee of $\alpha+\varepsilon$. We make this improvement by exploiting the fact that these random variables are defined by only two parameters (the probability of being non-zero and the non-zero value in the support), which allows us to avoid some of the complexity and overhead of our algorithm for arbitrary distributions.

STOC Conference 2024 Conference Paper

Detecting Low-Degree Truncation

  • Anindya De
  • Huan Li 0002
  • Shivam Nadimpalli
  • Rocco A. Servedio

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution D over ℝ n and a collection of data points in ℝ n , distinguish between the two possibilities that (i) the data was drawn from D , versus (ii) the data was drawn from D | S , i.e. from D subject to truncation by an unknown truncation set S ⊆ ℝ n . We study this problem in the setting where D is a high-dimensional i.i.d. product distribution and S is an unknown degree- d polynomial threshold function (one of the most well-studied types of Boolean-valued function over ℝ n ). Our main results are an efficient algorithm when D is a hypercontractive distribution, and a matching lower bound: 1. For any constant d , we give a polynomial-time algorithm which successfully distinguishes D from D | S using O ( n d /2 ) samples (subject to mild technical conditions on D and S ); 2. Even for the simplest case of D being the uniform distribution over {±1} n , we show that for any constant d , any distinguishing algorithm for degree- d polynomial threshold functions must use Ω( n d /2 ) samples.

FOCS Conference 2024 Conference Paper

Gaussian Approximation of Convex Sets by Intersections of Halfspaces

  • Anindya De
  • Shivam Nadimpalli
  • Rocco A. Servedio

We study the approximability of general convex sets in $\mathbb{R}^{n}$ by intersections of halfspaces, where the approximation quality is measured with respect to the standard Gaussian distribution and the complexity of an approximation is the number of halfspaces used. While a large body of research has considered the approximation of convex sets by intersections of halfspaces under distance metrics such as the Lebesgue measure and Hausdorff distance, prior to our work there has not been a systematic study of convex approximation under the Gaussian distribution. We establish a range of upper and lower bounds, both for general convex sets and for specific natural convex sets that are of particular interest. Our results demonstrate that the landscape of approximation is intriguingly different under the Gaussian distribution versus previously studied distance measures. Our results are proved using techniques from many different areas. These include classical results on convex polyhedral approximation, Cramér-type bounds on large deviations from probability theory, and-perhaps surprisingly-a range of topics from computational complexity, including computational learning theory, unconditional pseudorandomness, and the study of influences and noise sensitivity in the analysis of Boolean functions.

SODA Conference 2022 Conference Paper

Approximating Sumset Size

  • Anindya De
  • Shivam Nadimpalli
  • Rocco A. Servedio

Given a subset A of the n -dimensional Boolean hypercube, the sumset A+A is the set { a + a′: a, a′ ∊ A } where addition is in. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field. The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more detail, our algorithm is given oracle access to (the indicator function of) an arbitrary and an accuracy parameter ∊ > 0, and with high probability it outputs a value 0 ≤ v ≤ 1 that is ± ∊ -close to Vol ( A′ + A′ ) for some perturbation A′ ⊆ A of A satisfying Vol ( A \ A′ ) ≤ ∊. It is easy to see that without the relaxation of dealing with A′ rather than A, any algorithm for estimating Vol ( A + A ) to any nontrivial accuracy must make 2 Ω(n ) queries. In contrast, we give an algorithm whose query complexity depends only on ∊ and is completely independent of the ambient dimension n.

NeurIPS Conference 2021 Conference Paper

Approximate optimization of convex functions with outlier noise

  • Anindya De
  • Sanjeev Khanna
  • Huan Li
  • MohammadHesam NikpeySalekde

We study the problem of minimizing a convex function given by a zeroth order oracle that is possibly corrupted by {\em outlier noise}. Specifically, we assume the function values at some points of the domain are corrupted arbitrarily by an adversary, with the only restriction being that the total volume of corrupted points is bounded. The goal then is to find a point close to the function's minimizer using access to the corrupted oracle. We first prove a lower bound result showing that, somewhat surprisingly, one cannot hope to approximate the minimizer {\em nearly as well} as one might expect, even if one is allowed {\em an unbounded number} of queries to the oracle. Complementing this negative result, we then develop an efficient algorithm that outputs a point close to the minimizer of the convex function, where the specific distance matches {\em exactly}, up to constant factors, the distance bound shown in our lower bound result.

SODA Conference 2021 Conference Paper

Polynomial-time trace reconstruction in the smoothed complexity model

  • Xi Chen 0001
  • Anindya De
  • Chin Ho Lee
  • Rocco A. Servedio
  • Sandip Sinha

In the trace reconstruction problem, an unknown source string x ∊ {0, 1} n is sent through a probabilistic deletion channel which independently deletes each bit with probability δ and concatenates the surviving bits, yielding a trace of x. The problem is to reconstruct x given independent traces. This problem has received much attention in recent years both in the worst-case setting where x may be an arbitrary string in {0, 1} n [6, 19, 7, 8, 4] and in the average-case setting where x is drawn uniformly at random from {0, 1} n [21, 9, 8, 4]. This paper studies trace reconstruction in the smoothed analysis setting, in which a “worst-case” string x worst is chosen arbitrarily from {0, 1} n, and then a perturbed version x of x worst is formed by independently replacing each coordinate by a uniform random bit with probability σ. The problem is to reconstruct x given independent traces from it. Our main result is an algorithm which, for any constant perturbation rate 0 < σ < 1 and any constant deletion rate 0 < δ < 1, uses poly( n ) running time and traces and succeeds with high probability in reconstructing the string x. This stands in contrast with the worst-case version of the problem, for which the best known sample complexity is exp( Õ ( n 1/5 )) [5], a recent improvement on exp( O ( n 1/3 )) [6, 19]. Our approach is based on reconstructing x from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of our work is a new poly( n )-time procedure for reconstructing the multiset of all O (log n )-length subwords of any source string x ∊ {0, 1} n given access to traces of x.

JMLR Journal 2020 Journal Article

Learning Sums of Independent Random Variables with Sparse Collective Support

  • Anindya De
  • Philip M. Long
  • Rocco A. Servedio

We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbb{Z}_{+}$, a {sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutually independent (but not necessarily identically distributed) integer random variables with $\cup_i \mathrm{supp}(\mathbf{X}_i) \subseteq \mathcal{A}.$ We give two main algorithmic results for learning such distributions. First, for the case $| \mathcal{A} | = 3$, we give an algorithm for learning an unknown $\mathcal{A}$-sum to accuracy $\epsilon$ using $\mathrm{poly}(1/\epsilon)$ samples and running in time $\mathrm{poly}(1/\epsilon)$, independent of $N$ and of the elements of $\mathcal{A}$. Second, for an arbitrary constant $k \geq 4$, if $\mathcal{A} = \{ a_1,...,a_k\}$ with $0 \leq a_1 0$. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

FOCS Conference 2019 Conference Paper

Junta Correlation is Testable

  • Anindya De
  • Elchanan Mossel
  • Joe Neeman

The problem of tolerant junta testing is a natural and challenging problem which asks if the property of a function having some specified correlation with a k-Junta is testable. In this paper we give an affirmative answer to this question: There is an algorithm which given distance parameters c, d, and oracle access to a Boolean function f on the hypercube, has query complexity exp(k). poly(1/(cd)) and distinguishes between the following cases: 1) The distance of f from any k-junta is at least c; 2) There is a k-junta g which has distance at most d from f. This is the first non-trivial tester (i. e. , query complexity is independent of the ambient dimension n) which works for all c and d (bounded by 0. 5). The best previously known results by Blais et al. , required c to be at least 16d. In fact, with the same query complexity, we accomplish the stronger goal of identifying the most correlated k-junta, up to permutations of the coordinates. We can further improve the query complexity to poly(k/(c-d)) for the (weaker) task of distinguishing between the following cases: 1) The distance of f from any k'-junta is at least c. 2) There is a k-junta g which is at a distance at most d from f. Here k'=poly(k/(c-d)). Our main tools are Fourier analysis based algorithms that simulate oracle access to influential coordinates of functions.

SODA Conference 2018 Conference Paper

Boolean function analysis meets stochastic optimization: An approximation scheme for stochastic knapsack

  • Anindya De

The stochastic knapsack problem is the stochastic variant of the classical knapsack problem in which the algorithm designer is given a a knapsack with a given capacity and a collection of items where each item is associated with a profit and a probability distribution on its size. The goal is to select a subset of items with maximum profit and violate the capacity constraint with probability at most p (referred to as the overflow probability). While several approximation algorithms [27, 22, 4, 17, 30] have been developed for this problem, most of these algorithms relax the capacity constraint of the knapsack. In this paper, we design efficient approximation schemes for this problem without relaxing the capacity constraint. (i) Our first result is in the case when item sizes are Bernoulli random variables. In this case, we design a (nearly) fully polynomial time approximation scheme (FPTAS) which only relaxes the overflow probability. (ii) Our second result generalizes the first result to the case when all the item sizes are supported on a (common) set of constant size. In this case, we obtain a quasi-FPTAS. (iii) Our third result is in the case when item sizes are so-called “hypercontractive” random variables i. e. , random variables whose second and fourth moments are within constant factors of each other. In other words, the kurtosis of the random variable is upper bounded by a constant. This class has been widely studied in probability theory and most natural random variables are hypercontractive including well-known families such as Poisson, Gaussian, exponential and Laplace distributions. In this case, we design a polynomial time approximation scheme which relaxes both the overflow probability and maximum profit. Crucially, all of our algorithms meet the capacity constraint exactly, a result which was previously known only when the item sizes were Poisson or Gaussian random variables [22, 24]. Our results rely on new connections between Boolean function analysis and stochastic optimization and are obtained by an adaption and extension of ideas such as (central) limit theorems, moment matching theorems and the influential critical index machinery of Servedio [43] developed in the context of complexity theoretic analysis of halfspaces. We believe that these ideas and techniques may prove to be useful in other stochastic optimization problems as well.

FOCS Conference 2018 Conference Paper

Learning Sums of Independent Random Variables with Sparse Collective Support

  • Anindya De
  • Philip M. Long
  • Rocco A. Servedio

We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For a A ⊂Z + ubset A of non-negative integers, a sum of independent random variables with collective support A (called an "A-sum" in this paper) is a distribution S = X 1 +. .. + X N where the X i 's are mutually independent (but not necessarily identically distributed) integer random variables all of whose supports are contained in A. We give two main algorithmic results for learning such distributions: 1) For the case |A|=3, we give an algorithm for learning A-sums to accuracy ε that uses poly(1/ε) samples and runs in time poly(1/ε), independent of N and of the elements of A. 2) For an arbitrary constant k>=4, if A = {a 1, .. ., a k } with 0 1 k, we give an algorithm that uses poly(1/ε)*log log a k samples (independent of N) and runs in time poly(1/ε, log a k ). We prove an essentially matching lower bound: if |A| = 4, then any algorithm must use Ω(log log a 4 ) samples even for learning to constant accuracy. We also give similar-in-spirit (but quantitatively very different) algorithmic results, and essentially matching lower bounds, for the case in which A is not known to the learner. Our learning algorithms employ new limit theorems which may be of independent interest. Our algorithms and lower bounds together settle the question of how the sample complexity of learning sums of independent integer random variables scales with the elements in the union of their supports, both in the known-support and unknown-support settings. Finally, all our algorithms easily extend to the "semi-agnostic" learning model, in which training data is generated from a distribution that is only c*ε-close to some A-sum for a constant c>0.

SODA Conference 2018 Conference Paper

Non interactive simulation of correlated distributions is decidable

  • Anindya De
  • Elchanan Mossel
  • Joe Neeman

A basic problem in information theory is the following: Let P = ( X, Y ) be an arbitrary distribution where the marginals X and Y are (potentially) correlated. Let Alice and Bob be two players where Alice gets samples { x i } i≥1 and Bob gets samples { y i } i ≥i and for all i, ( x i, y i ) ∼ P. What joint distributions Q can be simulated by Alice and Bob without any interaction? Classical works in information theory by Gács-Körner and Wyner answer this question when at least one of P or Q is the distribution Eq ( Eq is defined as uniform over the points (0, 0) and (1, 1)). However, other than this special case, the answer to this question is understood in very few cases. Recently, Ghazi, Kamath and Sudan showed that this problem is decidable for Q supported on {0, 1} × {0, 1}. We extend their result to Q supported on any finite alphabet. Moreover, we show that If Q can be simulated, our algorithm also provides a (non-interactive) simulation protocol. We rely on recent results in Gaussian geometry (by the authors) as well as a new smoothing argument inspired by the method of boosting from learning theory and potential function arguments from complexity theory and additive combinatorics.

FOCS Conference 2016 Conference Paper

Noisy Population Recovery in Polynomial Time

  • Anindya De
  • Michael E. Saks
  • Sijian Tang

In the noisy population recovery problem of Dvir et al. [6], the goal is to learn an unknown distribution f on binary strings of length n from noisy samples. A noisy sample with parameter μ ∈ [0, 1] is generated by selecting a sample from f, and independently flipping each coordinate of the sample with probability (1-μ)/2. We assume an upper bound k on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error ε. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We describe an algorithm that for each μ > 0, provides the desired estimate of the distribution in time bounded by a polynomial in k, n and 1/ε improving upon the previous best result of poly(k log log k, n, 1/ε) due to Lovett and Zhang [9]. Our proof combines ideas from [9] with a noise attenuated version of Möbius inversion. The latter crucially uses the robust local inverse construction of Moitra and Saks [11].

FOCS Conference 2015 Conference Paper

Beyond the Central Limit theorem: Asymptotic Expansions and Pseudorandomness for Combinatorial Sums

  • Anindya De

We prove a new asymptotic expansion in the central limit theorem for sums of discrete independent random variables. The classical central limit theorem asserts that if {X i } i=1 n is a sequence of i. i. d. random variables, then S = Σ i=1 n X i converges to a Gaussian whose first two moments match those of. Further, the rate of convergence is O(n -1/2 ). Roughly speaking, asymptotic expansions of the central limit theorem show that by considering a family of limiting distributions specified by ≥ 2 moments (k = 2 corresponds to Gaussians) and matching the first moments of to such a limiting distribution, one can achieve a convergence of n -(-1)/2. While such asymptotic expansions have been known since Cramér [1], they did not apply to discrete and non-identical random variables. Further, the error bounds in nearly all cases was non-explicit (in their dependence on {X i }), thus limiting their applicability. In this work, we prove a new asymptotic expansions of the central limit theorem which applies to discrete and non-identical random variables and the error bounds are fully explicit. Given the wide applicability of the central limit theorem in probability theory and theoretical computer science, we believe that this new asymptotic expansion theorem will be applicable in several settings. As a main application in this paper, we give an application in derandomization: Namely, we construct PRGs for the class of combinatorial sums, a class of functions first studied by [2] and which generalize many previously studied classes such as combinatorial rectangles [3], small-biased spaces [4] and modular sums [5] among others. A function f: [m], n → {0, 1} is said to be a combinatorial sum if there exists functions f 1, .. ., f n: [m] → {0, 1} such that (x 1, .. ., x n ) = f 1 (x 1 ) +. .. +, f n (x, n ). For this class, we give a seed length of (log + log 3/2 (n/∈)), thus improving upon [2] whenever ϵ ≤ 2 -(log n)3/4.

STOC Conference 2015 Conference Paper

Boolean Function Monotonicity Testing Requires (Almost) n 1/2 Non-adaptive Queries

  • Xi Chen 0001
  • Anindya De
  • Rocco A. Servedio
  • Li-Yang Tan

We prove a lower bound of Ω(n 1/2-c ), for all c> 0, on the query complexity of (two-sided error) non-adaptive algorithms for testing whether an n-variable Boolean function is monotone versus constant-far from monotone. This improves a ~Ω(n 1/5 ) lower bound for the same problem that was obtained in [6], and is very close to the recent upper bound of ~O(n 1/2 /ε 2 ) by Khot et al. [13].

SODA Conference 2014 Conference Paper

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

  • Constantinos Daskalakis
  • Anindya De
  • Ilias Diakonikolas
  • Ankur Moitra
  • Rocco A. Servedio

We consider a problem which has received considerable attention in systems literature because of its applications to routing in delay tolerant networks and replica placement in distributed storage systems. In abstract terms the problem can be stated as follows: Given a random variable X generated by a known product distribution over {0, 1} n and a target value 0 ≤ θ ≤ 1, output a non-negative vector w, with ‖ w ‖ 1 ≤ 1, which maximizes the probability of the event w · X ≥ θ. This is a challenging non-convex optimization problem for which even computing the value Pr[ w · X ≥ θ ] of a proposed solution vector w is #P-hard. We provide an additive EPTAS for this problem which, for constant-bounded product distributions, runs in poly( n ) · 2 poly(1/∊) time and outputs an ∊-approximately optimal solution vector w for this problem. Our approach is inspired by, and extends, recent structural results from the complexity-theoretic study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth, we give a unicriterion PTAS while previous work for such objective functions has typically led to a bicriterion PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.

STOC Conference 2013 Conference Paper

Majority is stablest: discrete and SoS

  • Anindya De
  • Elchanan Mossel
  • Joe Neeman

The Majority is Stablest Theorem has numerous applications in hardness of approximation and social choice theory. We give a new proof of the Majority is Stablest Theorem by induction on the dimension of the discrete cube. Unlike the previous proof, it uses neither the "invariance principle" nor Borell's result in Gaussian space. The new proof is general enough to include all previous variants of majority is stablest such as "it ain't over until it's over" and "Majority is most predictable". Moreover, the new proof allows us to derive a proof of Majority is Stablest in a constant level of the Sum of Squares hierarchy. This implies in particular that Khot-Vishnoi instance of Max-Cut does not provide a gap instance for the Lasserre hierarchy.

STOC Conference 2010 Conference Paper

Near-optimal extractors against quantum storage

  • Anindya De
  • Thomas Vidick

We show that Trevisan's extractor and its variants [22,19] are secure against bounded quantum storage adversaries. One instantiation gives the first such extractor to achieve an output length Θ(K-b), where K is the source's entropy and b the adversary's storage, together with a poly-logarithmic seed length. Another instantiation achieves a logarithmic key length, with a slightly smaller output length Θ((K-b)/K γ ) for any γ>0. In contrast, the previous best construction [21] could only extract (K/b) 1/15 bits. Some of our constructions have the additional advantage that every bit of the output is a function of only a polylogarithmic number of bits from the source, which is crucial for some cryptographic applications. Our argument is based on bounds for a generalization of quantum random access codes, which we call quantum functional access codes . This is crucial as it lets us avoid the local list-decoding algorithm central to the approach in [21], which was the source of the multiplicative overhead.

STOC Conference 2008 Conference Paper

Fast integer multiplication using modular arithmetic

  • Anindya De
  • Piyush P. Kurur
  • Chandan Saha 0001
  • Ramprasad Saptharishi

We give an O(N • log N • 2 O(log*N) ) algorithm for multiplying two N-bit integers that improves the O(N • log N • log log N) algorithm by Schönhage-Strassen. Both these algorithms use modular arithmetic. Recently, Fürer gave an O(N • log N • 2 O(log*N) ) algorithm which however uses arithmetic over complex numbers as opposed to modular arithmetic. In this paper, we use multivariate polynomial multiplication along with ideas from Fürer's algorithm to achieve this improvement in the modular setting. Our algorithm can also be viewed as a p-adic version of Fürer's algorithm. Thus, we show that the two seemingly different approaches to integer multiplication, modular and complex arithmetic, are similar.