Arrow Research search

Author name cluster

T. S. Jayram

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
2 author rows

Possible papers

21

ICML Conference 2025 Conference Paper

Approximate Forest Completion and Learning-Augmented Algorithms for Metric Minimum Spanning Trees

  • Nate Veldt
  • Thomas Stanley
  • Benjamin W. Priest
  • Trevor Steil
  • Keita Iwabuchi
  • T. S. Jayram
  • Geoffrey Sanders

Finding a minimum spanning tree (MST) for $n$ points in an arbitrary metric space is a fundamental primitive for hierarchical clustering and many other ML tasks, but this takes $\Omega(n^2)$ time to even approximate. We introduce a framework for metric MSTs that first (1) finds a forest of trees using practical heuristics, and then (2) finds a small weight set of edges to connect disjoint components in the forest into a spanning tree. We prove that optimally solving step (2) still takes $\Omega(n^2)$ time, but we provide a subquadratic 2. 62-approximation algorithm. In the spirit of learning-augmented algorithms, we then show that if the heuristic forest found in step (1) overlaps with an optimal MST, we can approximate the original MST problem in subquadratic time, where the approximation factor depends on a measure of overlap. In practice, we find nearly optimal spanning trees for a wide range of metrics, while being orders of magnitude faster than exact algorithms.

SODA Conference 2018 Conference Paper

Resource-Efficient Common Randomness and Secret-Key Schemes

  • Badih Ghazi
  • T. S. Jayram

We study common randomness where two parties have access to i. i. d. samples from a known random source, and wish to generate a shared random key using limited (or no) communication with the largest possible probability of agreement. This problem is at the core of secret key generation in cryptography, with connections to communication under uncertainty and locality sensitive hashing. We take the approach of treating correlated sources as a critical resource, and ask whether common randomness can be generated resource-efficiently. We consider two notable sources in this setup arising from correlated bits and correlated Gaussians. We design the first explicit schemes that use only a polynomial number of samples (in the key length) so that the players can generate shared keys that agree with constant probability using optimal communication. The best previously known schemes were both non-constructive and used an exponential number of samples. In the amortized setting, we characterize the largest achievable ratio of key length to communication in terms of the external and internal information costs, two well-studied quantities in theoretical computer science. In the relaxed setting where the two parties merely wish to improve the correlation between the generated keys of length k, we show that there are no interactive protocols using o ( k ) bits of communication having agreement probability even as small as 2 – o ( k ). For the related communication problem where the players wish to compute a joint function f of their inputs using i. i. d samples from a known source, we give a simultaneous message passing protocol using 2 O ( c ) bits where c is the interactive randomized public-coin communication complexity of f. This matches the lower bound shown previously while the best previously known upper bound was doubly exponential in c. Our schemes reveal a new connection between common randomness and unbiased error-correcting codes, e. g. , dual-BCH codes and their analogues in Euclidean space.

FOCS Conference 2009 Conference Paper

The Data Stream Space Complexity of Cascaded Norms

  • T. S. Jayram
  • David P. Woodruff

We consider the problem of estimating cascaded aggregates over a matrix presented as a sequence of updates in a data stream. A cascaded aggregate P · Q is defined by evaluating aggregate Q repeatedly over each row of the matrix, and then evaluating aggregate P over the resulting vector of values. This problem was introduced by Cormode and Muthukrishnan, PODS, 2005 [CM]. We analyze the space complexity of estimating cascaded norms on an n × d matrix to within a small relative error. Let L p denote the p-th norm, where p is a non-negative integer. We abbreviate the cascaded norm L k · L p by L k, p. (1) For any constant k ¿ p ¿ 2, we obtain a 1-pass O¿(n 1-2/k d 1-2/p )-space algorithm for estimating L k, p. This is optimal up to polylogarithmic factors and resolves an open question of [CM] regarding the space complexity of L 4, 2. We also obtain 1-pass space-optimal algorithms for estimating L ¿, k and L k, ¿. (2) We prove a space lower bound of ¿(n 1-1/k ) on estimating L k, 0 and L k, 1, resolving an open question due to Indyk, IITK Data Streams Workshop (Problem 8), 2006. We also resolve two more questions of [CM] concerning L k, 2 estimation and block heavy hitter problems. Ganguly, Bansal and Dube (FAW, 2008) claimed an O(1)-space algorithm for estimating L k, p for any k, p ¿ [0, 2]. Our lower bounds show this claim is incorrect.

STOC Conference 2007 Conference Paper

Lower bounds for randomized read/write stream algorithms

  • Paul Beame
  • T. S. Jayram
  • Atri Rudra

Motivated by the capabilities of modern storage architectures, we consider the following generalization of the data stream model where the algorithm has sequential access to multiple streams. Unlike the data stream model, where the stream is read only, in this new model (introduced in [8,9]) the algorithms can also write onto streams. There is no limit on the size of the streams but the number of passes made on the streams is restricted. On the other hand, the amount of internal memory used by the algorithm is scarce, similar to data stream model. We resolve the main open problem in [7] of proving lower bounds in this model for algorithms that are allowed to have 2-sided error. Previously, such lower bounds were shown only for deterministic and 1-sided error randomized algorithms [9,7]. We consider the classical set disjointness problemthat has proved to be invaluable for deriving lower bounds for many other problems involving data streams and other randomized models of computation. For this problem, we show a near-linear lower bound on the size of the internal memory used by a randomized algorithm with 2-sided error that is allowed to have o(log N/log log N) passes over the streams. This bound is almost optimal sincethere is a simple algorithm that can solve this problem using logarithmic memory if the number of passes over the streams. Applications include near-linear lower bounds onthe internal memory for well-known problems in the literature:(1) approximately counting the number of distinct elements in the input (F 0 );(2) approximating the frequency of the mod of an input sequence(F* ∞ );(3) computing the join of two relations; and (4) deciding if some node of an XML document matches an XQuery (or XPath) query. Our techniques involve a novel direct-sum type of argument that yields lower bounds for many other problems. Our results asymptotically improve previously known bounds for any problem even in deterministic and 1-sided error models of computation.

FOCS Conference 2006 Conference Paper

Index Coding with Side Information

  • Ziv Bar-Yossef
  • Yitzhak Birk
  • T. S. Jayram
  • Tomer Kol

Motivated by a problem of transmitting data over broadcast channels (BirkandKol, INFOCOM1998), we study the following coding problem: a sender communicates with n receivers R l, .. , R n. He holds an input x isin {0, 1} n and wishes to broadcast a single message so that each receiver R i can recover the bit x i. Each R i has prior side information about x, induced by a directed graph G on n nodes; R i knows the bits of x in the positions {j | (i, j) is anedge of G}. We call encoding schemes that achieve this goal INDEX codes for {0, 1} n with side information graph G. In this paper we identify a measure on graphs, the minrank, which we conjecture to exactly characterize the minimum length of INDEX codes. We resolve the conjecture for certain natural classes of graphs. For arbitrary graphs, we show that the minrank bound is tight for both linear codes and certain classes of non-linear codes. For the general problem, we obtain a (weaker) lower bound that the length of an INDEX code for any graph G is at least the size of the maximum acyclic induced subgraph of G

FOCS Conference 2004 Conference Paper

Approximating Edit Distance Efficiently

  • Ziv Bar-Yossef
  • T. S. Jayram
  • Robert Krauthgamer
  • Ravi Kumar 0001

Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than /spl lscr/, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)/sup 2/3/ gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. /spl lscr/ gap problem, where /spl lscr/ can be as small as O(k/sup 2/) - still with a constant sketch - but works only for strings that are mildly "nonrepetitive". Finally, we develop an n/sup 3/7/-approximation quasilinear time algorithm for edit distance, improving the previous best factor of n/sup 3/4/ (Cole and Hariharan, 2002); if the input strings are assumed to be nonrepetitive, then the approximation factor can be strengthened to n/sup 1/3/.

STOC Conference 2004 Conference Paper

Exponential separation of quantum and classical one-way communication complexity

  • Ziv Bar-Yossef
  • T. S. Jayram
  • Iordanis Kerenidis

We give the first exponential separation between quantum and bounded-error randomized one-way communication complexity. Specifically, we define the Hidden Matching Problem HM n : Alice gets as input a string x ∈ (0, 1) n and Bob gets a perfect matching M on the n coordinates. Bob's goal is to output a tuple [i,j,b] such that the edge (i,j) belongs to the matching M and b = x i ⊕ x j . We prove that the quantum one-way communication complexity of HM n is O(log n), yet any randomized one-way protocol with bounded error must use Ω(√n) bits of communication. No asymptotic gap for one-way communication was previously known. Our bounds also hold in the model of Simultaneous Messages (SM) and hence we provide the first exponential separation between quantum SM and randomized SM with public coins.For a Boolean decision version of HM n , we show that the quantum one-way communication complexity remains O(log n) and that the 0-error randomized one-way communication complexity is Ω(n). We prove that any randomized linear one-way protocol with bounded error for this problem requires Ω(√[3] n log n) bits of communication.

FOCS Conference 2002 Conference Paper

An Information Statistics Approach to Data Stream and Communication Complexity

  • Ziv Bar-Yossef
  • T. S. Jayram
  • Ravi Kumar 0001
  • D. Sivakumar 0001

We present a new method for proving strong lower bounds in communication complexity. This method is based on the notion of the conditional information complexity of a function which is the minimum amount of information about the inputs that has to be revealed by a communication protocol for the function. While conditional information complexity is a lower bound on the communication complexity, we show that it also admits a direct sum theorem. Direct sum decomposition reduces our task to that of proving (conditional) information complexity lower bounds for simple problems (such as the AND of two bits). For the latter, we develop novel techniques based on Hellinger distance and its generalizations.

STOC Conference 2002 Conference Paper

Approximate counting of inversions in a data stream

  • Miklós Ajtai
  • T. S. Jayram
  • Ravi Kumar 0001
  • D. Sivakumar 0001

(MATH) Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model {14, 2] is a natural setting to design efficient algorithms.We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is $O(\log n \log \log n)$ through a deterministic algorithm. In contrast, we derive an $\Omega(n)$ lower bound for randomized exact computation for this problem; thus approximation is essential.(MATH) We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized $O(\sqrt{n} \log n)$-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized $O(\sqrt{n} \log^2 n)$-space two-pass algorithm. In contrast, we derive $\Omega(n)$-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential.All our algorithms use only O (log n ) time per data item.

STOC Conference 2001 Conference Paper

Online server allocation in a server farm via benefit task systems

  • T. S. Jayram
  • Tracy Kimbrel
  • Robert Krauthgamer
  • Baruch Schieber
  • Maxim Sviridenko

A web content hosting service provider needs to dynamically allocate servers in a server farm to its customers' web sites. Ideally, the allocation to a site should always suffice to handle its load. However, due to a limited number of servers and the overhead incurred in changing the allocation of a server from one site to another, the system may become overloaded. The problem faced by the web hosting service provider is how to allocate the available servers in the most profitable way. Adding to the complexity of this problem is the fact that future loads of the sites are either unknown or known only for the very near future. In this paper we model this server allocation problem , and consider both its offline and online versions. We give a polynomial time algorithm for computing the optimal offline allocation. In the online setting, we show almost optimal algorithms (both deterministic and randomized) for any positive lookahead. The quality of the solution improves as the lookahead increases. We also consider several special cases of practical interest. Finally, we present some experimental results using actual trace data that show that one of our online algorithm performs very close to optimal. Interestingly, the online server allocation problem can be cast as a more general benefit task system that we define. Our results extend to this task system, which captures also the benefit maximization variants of the k -server problem and the metrical task system problem. It follows that the benefit maximization variants of these problems are more tractable than their cost minimization variants.

FOCS Conference 1998 Conference Paper

Time-Space Tradeoffs for Branching Programs

  • Paul Beame
  • Michael E. Saks
  • T. S. Jayram

We obtain the first non-trivial time-space tradeoff lower bound for functions f: {0, 1}/sup n//spl rarr/{0, 1} on general branching programs by exhibiting a Boolean function f that requires exponential size to be computed by any branching program of length (1+/spl epsiv/)n, for some constant /spl epsiv/>0. We also give the first separation result between the syntactic and semantic read-k models for k>1 by showing that polynomial-size semantic read-twice branching programs can compute functions that require exponential size on any syntactic read-k branching program. We also show a time-space tradeoff result on the more general R-way branching program model: for any k, we give a function that requires exponential size to be computed by length kn q-way branching programs, for some q=q(k).

FOCS Conference 1994 Conference Paper

Efficient Oblivious Branching Programs for Threshold Functions

  • Rakesh K. Sinha
  • T. S. Jayram

In his survey paper on branching programs, A. A. Razborov (1991) asked the following question: Does every rectifier-switching network computing the majority of n bits have size n/sup 1+/spl Omega/(1/)? We answer this question in the negative by constructing a simple oblivious branching program of size O(n log/sup 3/ n/log log n log log log n) for computing any threshold function. This improves the previously best known upper bound of O(n/sup 3/2/) due to O. B. Lupanov (1965). >