Author name cluster

Vineet Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

NeurIPS Conference 2023 Conference Paper

A Computationally Efficient Sparsified Online Newton Method

Fnu Devvrit
Sai Surya Duvvuri
Rohan Anil
Vineet Gupta
Cho-Jui Hsieh
Inderjit Dhillon

Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton~(SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to $30\\%$ faster convergence, $3. 4\\%$ relative improvement in validation performance, and $80\\%$ relative improvement in training loss, in comparison to memory efficient optimizers including first order methods. Powering the method is a surprising fact -- imposing structured sparsity patterns, like tridiagonal and banded structure, requires little to no overhead, making it as efficient and parallelizable as first-order methods. In wall-clock time, tridiagonal SONew is only about $3\\%$ slower per step than first-order methods but gives overall gains due to much faster convergence. In contrast, one of the state-of-the-art (SOTA) memory-intensive second-order methods, Shampoo, is unable to scale to large benchmarks. Additionally, while Shampoo necessitates significant engineering efforts to scale to large benchmarks, SONew offers a more straightforward implementation, increasing its practical appeal. SONew code is available at: https: //github. com/devvrit/SONew

PDF Details

NeurIPS Conference 2019 Conference Paper

Memory Efficient Adaptive Optimization

Rohan Anil
Vineet Gupta
Tomer Koren
Yoram Singer

Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling. However, these methods maintain second-order statistics for each parameter, thus introducing significant memory overheads that restrict the size of the model being used as well as the number of examples in a mini-batch. We describe an effective and flexible adaptive optimization method with greatly reduced memory overhead. Our method retains the benefits of per-parameter adaptivity while allowing significantly larger models and batch sizes. We give convergence guarantees for our method, and demonstrate its effectiveness in training very large translation and language models with up to 2-fold speedups compared to the state-of-the-art.

PDF Details

I&C Journal 2010 Journal Article

Weak bisimulation is sound and complete for pCTL ∗

Josée Desharnais
Vineet Gupta
Radha Jagadeesan
Prakash Panangaden

We investigate weak bisimulation of probabilistic systems in the presence of nondeterminism, i. e. labelled concurrent Markov chains (LCMC) with silent transitions. We develop an approach based on allowing convex combinations of computations, similar to Segala and Lynch’s use of randomized schedulers. The definition of weak bisimulation destroys the additivity property of the probability distributions, yielding instead capacities. The mathematics behind capacities naturally captures the intuition that when we deal with nondeterminism we must work with bounds on the possible probabilities rather than with their exact values. Our analysis leads to three new developments: • We identify a characterization of “image finiteness” for countable-state systems and present a new definition of weak bisimulation for these LCMCs. We prove that our definition coincides with that of Philippou, Lee and Sokolsky for finite state systems. • We show that bisimilar states have matching computations. The notion of matching involves convex combinations of transitions. • We study a minor variant of the probabilistic logic pCTL ∗ – the variation arises from an extra path formula to address action labels. We show that bisimulation is sound and complete for this variant of pCTL ∗. This is an extended complete version of a paper that was presented at CONCUR 2002.

Details DOI

TCS Journal 2004 Journal Article

Metrics for labelled Markov processes

Josée Desharnais
Vineet Gupta
Radha Jagadeesan
Prakash Panangaden

The notion of process equivalence of probabilistic processes is sensitive to the exact probabilities of transitions. Thus, a slight change in the transition probabilities will result in two equivalent processes being deemed no longer equivalent. This instability is due to the quantitative nature of probabilistic processes. In a situation where the process behavior has a quantitative aspect there should be a more robust approach to process equivalence. This paper studies a metric between labelled Markov processes. This metric has the property that processes are at zero distance if and only if they are bisimilar. The metric is inspired by earlier work on logics for characterizing bisimulation and is related, in spirit, to the Hutchinson metric.

Details DOI

I&C Journal 2003 Journal Article

Approximating labelled Markov processes

Josée Desharnais
Vineet Gupta
Radha Jagadeesan
Prakash Panangaden

Labelled Markov processes are probabilistic versions of labelled transition systems. In general, the state space of a labelled Markov process may be a continuum. In this paper, we study approximation techniques for continuous-state labelled Markov processes. We show that the collection of labelled Markov processes carries a Polish-space structure with a countable basis given by finite-state Markov chains with rational probabilities; thus permitting the approximation of quantitative observations (e. g. , an integral of a continuous function) of a continuous-state labelled Markov process by the observations on finite-state Markov chains. The primary technical tools that we develop to reach these results are • A variant of a finite-model theorem for the modal logic used to characterize bisimulation, and • an isomorphism between the poset of Markov processes (ordered by simulation) with the ω-continuous dcpo Proc (defined as the solution of the recursive domain equation Proc=∏ L P Pr (Proc)). The isomorphism between labelled Markov processes and Proc can be independently viewed as a full-abstraction result relating an operational (labelled Markov process) and a denotational (Proc) model and yields a logic complete for reasoning about simulation for continuous-state processes.

Details DOI