Author name cluster

Guru Guruganesh

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Robust Contextual Pricing

Anupam Gupta
Guru Guruganesh
Renato Leme
Jon Schneider

We provide an algorithm with regret $O(C d \log \log T)$ for contextual pricing with $C$ corrupted rounds, improving over the previous bound of $O(d^3 C \log^2(T))$ of Krishnamurthy et al. The result is based on a reduction that calls the uncorrupted algorithm as a black-box, unlike the previous approach that modifies the inner workings of the uncorrupted algorithm. As a result, it leads to a conceptually simpler algorithm. Finally, we provide a lower bound ruling out a $O(C + d\log \log T)$ algorithm. This shows that robustifying contextual pricing is harder than robustifying contextual search with $\epsilon$-ball losses, for which it is possible to design algorithms where corruptions add only an extra additive term $C$ to the regret.

PDF Details

ICLR Conference 2024 Conference Paper

Functional Interpolation for Relative Positions improves Long Context Transformers

Shanda Li
Chong You
Guru Guruganesh
Joshua Ainslie
Santiago Ontañón
Manzil Zaheer
Sumit Sanghai
Yiming Yang 0002

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models. Though the Transformer architecture has fundamentally no limits on the input sequence lengths it can process, the choice of position encoding used during training can limit the performance of these models on longer inputs. We propose a novel functional relative position encoding with progressive interpolation, FIRE, to improve Transformer generalization to longer contexts. We theoretically prove that this can represent some of the popular relative position encodings, such as T5's RPE, Alibi, and Kerple. We next empirically show that FIRE models have better generalization to longer contexts on both zero-shot language modeling and long text benchmarks.

Details

SODA Conference 2024 Conference Paper

Prior-Independent Auctions for Heterogeneous Bidders

Guru Guruganesh
Aranyak Mehta
Di Wang 0005
Kangning Wang 0001

Details

ICML Conference 2023 Conference Paper

Optimal No-Regret Learning for One-Sided Lipschitz Functions

Paul Dütting
Guru Guruganesh
Jon Schneider
Joshua R. Wang

Inspired by applications in pricing and contract design, we study the maximization of one-sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow too quickly in one direction. We show that it is possible to learn a maximizer for such a function while incurring $O(\log \log T)$ total regret (with a universal constant independent of the number of discontinuities / complexity of the function). This regret bound is asymptotically optimal in $T$ due to a lower bound of Kleinberg and Leighton. By applying this algorithm, we show that one can sell digital goods to multiple buyers and learn the optimal linear contract in the principal-agent setting while incurring at most $O(\log \log T)$ regret.

Details

NeurIPS Conference 2022 Conference Paper

A Fourier Approach to Mixture Learning

Mingda Qiao
Guru Guruganesh
Ankit Rawat
Kumar Avinava Dubey
Manzil Zaheer

We revisit the problem of learning mixtures of spherical Gaussians. Given samples from a mixture $\frac{1}{k}\sum_{j=1}^{k}\mathcal{N}(\mu_j, I_d)$, the goal is to estimate the means $\mu_1, \mu_2, \ldots, \mu_k \in \mathbb{R}^d$ up to a small error. The hardness of this learning problem can be measured by the \emph{separation} $\Delta$ defined as the minimum distance between all pairs of means. Regev and Vijayaraghavan (2017) showed that with $\Delta = \Omega(\sqrt{\log k})$ separation, the means can be learned using $\mathrm{poly}(k, d)$ samples, whereas super-polynomially many samples are required if $\Delta = o(\sqrt{\log k})$ and $d = \Omega(\log k)$. This leaves open the low-dimensional regime where $d = o(\log k)$. In this work, we give an algorithm that efficiently learns the means in $d = O(\log k/\log\log k)$ dimensions under separation $d/\sqrt{\log k}$ (modulo doubly logarithmic factors). This separation is strictly smaller than $\sqrt{\log k}$, and is also shown to be necessary. Along with the results of Regev and Vijayaraghavan (2017), our work almost pins down the critical separation threshold at which efficient parameter learning becomes possible for spherical Gaussian mixtures. More generally, our algorithm runs in time $\mathrm{poly}(k)\cdot f(d, \Delta, \epsilon)$, and is thus fixed-parameter tractable in parameters $d$, $\Delta$ and $\epsilon$. Our approach is based on estimating the Fourier transform of the mixture at carefully chosen frequencies, and both the algorithm and its analysis are simple and elementary. Our positive results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.

PDF Details

STOC Conference 2021 Invited Paper

Chasing convex bodies with linear competitive ratio (invited paper)

C. J. Argue
Anupam Gupta 0001
Guru Guruganesh
Ziye Tang

Details

NeurIPS Conference 2021 Conference Paper

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

Sreenivas Gollapudi
Guru Guruganesh
Kostas Kollias
Pasin Manurangsi
Renato Leme
Jon Schneider

We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden $d$-dimensional value $w^*$. Every round, we are presented with a subset $\mathcal{X}_t \subseteq \mathbb{R}^d$ of possible actions. If we choose (i. e. recommend to the user) action $x_t$, we obtain utility $\langle x_t, w^* \rangle$ but only learn the identity of the best action $\arg\max_{x \in \X_t} \langle x, w^* \rangle$. We design algorithms for this problem which achieve regret $O(d\log T)$ and $\exp(O(d \log d))$. To accomplish this, we design novel cutting-plane algorithms with low “regret” -- the total distance between the true point $w^*$ and the hyperplanes the separation oracle returns. We also consider the variant where we are allowed to provide a list of several recommendations. In this variant, we give an algorithm with $O(d^2 \log d)$ regret and list size $\poly(d)$. Finally, we construct nearly tight algorithms for a weaker variant of this problem where the learner only learns the identity of an action that is better than the recommendation. Our results rely on new algorithmic techniques in convex geometry (including a variant of Steiner’s formula for the centroid of a convex set) which may be of independent interest.

PDF Details

AAAI Conference 2021 Conference Paper

Convergence Analysis of No-Regret Bidding Algorithms in Repeated Auctions

Zhe Feng
Guru Guruganesh
Christopher Liaw
Aranyak Mehta
Abhishek Sethi

The connection between games and no-regret algorithms has been widely studied in the literature. A fundamental result is that when all players play no-regret strategies, this produces a sequence of actions whose time-average is a coarsecorrelated equilibrium of the game. However, much less is known about equilibrium selection in the case that multiple equilibria exist. In this work, we study the convergence of no-regret bidding algorithms in auctions. Besides being of theoretical interest, bidding dynamics in auctions is an important question from a practical viewpoint as well. We study repeated game between bidders in which a single item is sold at each time step and the bidder’s value is drawn from an unknown distribution. We show that if the bidders use any mean-based learning rule then the bidders converge with high probability to the truthful pure Nash Equilibrium in a second price auction, in VCG auction in the multi-slot setting and to the Bayesian Nash equilibrium in a first price auction. We note mean-based algorithms cover a wide variety of known no-regret algorithms such as Exp3, UCB, ε-Greedy etc. Also, we analyze the convergence of the individual iterates produced by such learning algorithms, as opposed to the time-average of the sequence. Our experiments corroborate our theoretical findings and also find a similar convergence when we use other strategies such as Deep Q-Learning.

PDF Details

NeurIPS Conference 2021 Conference Paper

Margin-Independent Online Multiclass Learning via Convex Geometry

Guru Guruganesh
Allen Liu
Jon Schneider
Joshua Wang

We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online. Unlike traditional bounds which seek to minimize the misclassification rate, we minimize the total distance from each query to the region corresponding to its assigned label. When the true labels are determined via a nearest neighbor partition -- i. e. the label of a point is given by which of $k$ centers it is closest to in Euclidean distance -- we show that one can achieve a loss that is independent of the total number of queries. We complement this result by showing that learning general convex sets requires an almost linear loss per query. Our results build off of regret guarantees for the problem of contextual search. In addition, we develop a novel reduction technique from multiclass classification to binary classification which may be of independent interest.

PDF Details

NeurIPS Conference 2020 Conference Paper

Big Bird: Transformers for Longer Sequences

Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
Santiago Ontanon
Philip Pham
Anirudh Ravula

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

PDF Details

SODA Conference 2020 Conference Paper

Chasing Convex Bodies with Linear Competitive Ratio

C. J. Argue
Anupam Gupta 0001
Guru Guruganesh
Ziye Tang

Details

SODA Conference 2020 Conference Paper

Sticky Brownian Rounding and its Applications to Constraint Satisfaction Problems

Sepehr Abbasi Zadeh
Nikhil Bansal 0001
Guru Guruganesh
Aleksandar Nikolov
Roy Schwartz 0002
Mohit Singh

Semi-definite programming is a powerful tool in the design and analysis of approximation algorithms for combinatorial optimization problems. In particular, the random hyperplane rounding method of Goemans and Williamson [23] has been extensively studied for more than two decades, resulting in various extensions to the original technique and beautiful algorithms for a wide range of applications. Despite the fact that this approach yields tight approximation guarantees for some problems, e. g. , M ax -C ut, for many others, e. g. , M ax -SAT and M ax -D i C ut, the tight approximation ratio is still unknown. One of the main reasons for this is the fact that very few techniques for rounding semi-definite relaxations are known. In this work, we present a new general and simple method for rounding semi-definite programs, based on Brownian motion. Our approach is inspired by recent results in algorithmic discrepancy theory. We develop and present tools for analyzing our new rounding algorithms, utilizing mathematical machinery from the theory of Brownian motion, complex analysis, and partial differential equations. Focusing on constraint satisfaction problems, we apply our method to several classical problems, including M ax -C ut, M ax -2SAT, and M ax -D i C ut, and derive new algorithms that are competitive with the best known results. To illustrate the versatility and general applicability of our approach, we give new approximation algorithms for the M ax -C ut problem with side constraints that crucially utilizes measure concentration results for the Sticky Brownian Motion, a feature missing from hyperplane rounding and its generalizations.

Details

SODA Conference 2015 Conference Paper

Improved Region-Growing and Combinatorial Algorithms for k -Route Cut Problems (Extended Abstract)

Guru Guruganesh
Laura Sanità
Chaitanya Swamy

We study the k-route generalizations of various cut problems, the most general of which is k-route multicut ( k -MC) problem, wherein we have r source-sink pairs and the goal is to delete a minimum-cost set of edges to reduce the edge-connectivity of every source-sink pair to below k. The k -route extensions of multiway cut ( k -MWC), and the minimum s-t cut problem ( k- ( s, t )-Cut), are similarly defined. We present various approximation and hardness results for k -MC, k -MWC, and k -( s, t )-Cut that improve the state-of-the-art for these problems in several cases. Our contributions are threefold. For k-route multiway cut, we devise simple, but surprisingly effective, combinatorial algorithms that yield bicriteria approximation guarantees that markedly improve upon the previous-best guarantees. For k-route multicut, we design algorithms that improve upon the previous-best approximation factors by roughly an -factor, when k = 2, and for general k and unit costs and any fixed violation of the connectivity threshold k. The main technical innovation is the definition of a new, powerful region growing lemma that allows us to perform region-growing in a recursive fashion even though the LP solution yields a different metric for each source-sink pair, and without incurring an O (log 2 r ) blow-up in the cost that is inherent in some previous applications of region growing to k -route cuts. We obtain the same benefits as [15] do in their divide-and-conquer algorithms, and thereby obtain an O (ln r ln ln r )-approximation to the cost. We also obtain some extensions to k -route node-multicut problems. We complement these results by showing that the k-route s-t cut problem is at least as hard to approximate as the densest-k-subgraph (D k S) problem on uniform hypergraphs. In particular, this implies that one cannot avoid a poly( k )-factor if one seeks a unicriterion approximation, without improving the state-of-the-art for D k S on graphs, and proving the existence of a family of one-way functions. Previously, only NP -hardness of k -( s; t )-Cut was known.

Details

STOC Conference 2015 Conference Paper

On the Lovász Theta function for Independent Sets in Sparse Graphs

Nikhil Bansal 0001
Anupam Gupta 0001
Guru Guruganesh

We consider the maximum independent set problem on graphs with maximum degree d. We show that the integrality gap of the Lovasz Theta function-based SDP has an integrality gap of O~(d/log 3/2 d). This improves on the previous best result of O~(d/log d), and narrows the gap of this basic SDP to the integrality gap of O~(d/log 2 d) recently shown for stronger SDPs, namely those obtained using poly log(d) levels of the SA+ semidefinite hierarchy. The improvement comes from an improved Ramsey-theoretic bound on the independence number of K r -free graphs for large values of r. We also show how to obtain an algorithmic version of the above-mentioned SAplus-based integrality gap result, via a coloring algorithm of Johansson. The resulting approximation guarantee of O~(d/log 2 d) matches the best unique-games-based hardness result up to lower-order poly (log log d) factors.

Details