Author name cluster

Ananth Grama

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Agnostic Continuous-Time Online Learning

Pramith Devulapalli
Changlong Wu
Ananth Grama
Wojciech Szpankowski

We study agnostic online learning from continuous-time data streams, a setting that naturally arises in applications such as environmental monitoring, personalized recommendation, and high-frequency trading. Unlike classical discrete-time models, learners in this setting must interact with a continually evolving data stream while making queries and updating models only at sparse, strategically selected times. We develop a general theoretical framework for learning from both *oblivious* and *adaptive* data streams, which may be noisy and non-stationary. For oblivious streams, we present a black-box reduction to classical online learning that yields a regret bound of $T \cdot R(S)/S$ for any class with discrete-time regret $R(S)$, where $T$ is the time horizon and $S$ is the *query budget*. For adaptive streams, which can evolve in response to learner actions, we design a dynamic query strategy in conjunction with a novel importance weighting scheme that enables unbiased loss estimation. In particular, for hypothesis class $\mathcal{H}$ with a finite Littlestone dimension, we establish a tight regret bound of $\tilde{\Theta}(T \cdot \sqrt{\mathsf{Ldim}(\mathcal{H})/S})$ that holds in both settings. Our results provide the first *quantitative* characterization of agnostic learning in continuous-time online environments with limited interaction.

PDF Details

ICML Conference 2025 Conference Paper

From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Ajay Kumar Jaiswal
Yifan Wang
Lu Yin 0006
Shiwei Liu 0003
Runjin Chen
Jiawei Zhao
Ananth Grama
Yuandong Tian

Large Language Models (LLMs) matrices can often be expressed in low-rank format with potential to relax memory and compute resource requirements. Unlike previous works which pivot around developing novel matrix decomposition algorithms, in this work we focus to study the emerging non-uniform low-rank properties across weight matrices in LLMs through the lens of stabilizing gradient subspace. Firstly, we provide a theoretical framework to understand the stabilization of gradient subspaces through Hessian analysis. Secondly, we empirically establish a consequential relationship between the gradient dynamics and low-rank expressiveness of weight matrices. Our findings reveal that different LLM components exhibit varying levels of converged low-rank structure, necessitating a non-uniform rank reduction across them to minimize performance drop due to compression. In view of that, we present Weight Low-Rank Projection (WeLore) that unifies weight compression and memory-efficient fine-tuning as ONE, in a data-agnostic and one-shot way. Going beyond only as a compression technique, WeLore categorizes weight matrices into Low-rank Components (LRCs) and Non-Low-rank Components (N-LRCs) based on their ability to express themselves as low-rank. Our gradient dynamics perspective illustrate that LRCs tend to have better finetuning capabilities and their standalone finetuning can closely mimic (sometimes outperform) the training loss trajectory and performance of full-finetuning with notable memory and compute footprint reduction. All codes and checkpoints will be released.

Details

NeurIPS Conference 2025 Conference Paper

GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow

Mengbo Wang
Shourya Verma
Aditya Malusare
Luopin Wang
Yiyang Lu
Vaneet Aggarwal
Mario Sola
Ananth Grama

Spatial transcriptomics technologies can be used to align transcriptomes with histopathological morphology, presenting exciting new opportunities for biomolecular discovery. Using spatial transcriptomic gene expression and corresponding histology data, we construct a novel framework, GeneFlow, to map single- and multi-cell gene expression onto paired cellular images. By combining an attention-based RNA encoder with a conditional UNet guided by rectified flow, we generate high-resolution images with different staining methods (e. g. , H&E, DAPI) to highlight various cellular/ tissue structures. Rectified flow with high-order ODE solvers creates a continuous, bijective mapping between expression and image manifolds, addressing the many-to-one relationship inherent in this problem. Our method enables the generation of realistic cellular morphology features and spatially resolved intercellular interactions under genetic or chemical perturbations. This enables minimally invasive disease diagnosis by revealing dysregulated patterns in imaging phenotypes. Our rectified flow based method outperforms diffusion methods and baselines in all experiments. Code is available at https: //github. com/wangmengbo/GeneFlow.

PDF Details

ICLR Conference 2025 Conference Paper

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

Changlong Wu
Ananth Grama
Wojciech Szpankowski

Generative models have shown impressive capabilities in synthesizing high-quality outputs across various domains. However, a persistent challenge is the occurrence of "hallucinations," where the model produces outputs that are not grounded in the underlying facts. While empirical strategies have been explored to mitigate this issue, a rigorous theoretical understanding remains elusive. In this paper, we develop a theoretical framework to analyze the *learnability* of non-hallucinating generative models from a learning-theoretic perspective. Our results reveal that non-hallucinating learning is statistically *impossible* when relying solely on the training dataset, even for a hypothesis class of size two and when the entire training set is truthful. To overcome these limitations, we show that incorporating *inductive biases* aligned with the actual facts into the learning process is essential. We provide a systematic approach to achieve this by restricting the fact set to a concept class of finite VC-dimension and demonstrate its effectiveness under various learning paradigms. Although our findings are primarily conceptual, they represent a first step towards a principled approach to addressing hallucinations in learning generative models.

Details

NeurIPS Conference 2025 Conference Paper

Robust Integrated Learning and Pauli Noise Mitigation for Parametrized Quantum Circuits

Md Mobasshir Arshed Naved
Wenbo Xie
Wojciech Szpankowski
Ananth Grama

We propose a novel gradient-based framework for learning parameterized quantum circuits (PQCs) in the presence of Pauli noise in gate operation. The key innovation in our framework is the simultaneous optimization of model parameters and learning of an inverse noise channel, specifically designed to mitigate Pauli noise. Our parametrized inverse noise model utilizes the Pauli-Lindblad equation and relies on the principle underlying the Probabilistic Error Cancellation (PEC) protocol to learn an effective and scalable mechanism for noise mitigation. In contrast to conventional approaches that apply predetermined inverse noise models during execution, our method systematically mitigates Pauli noise by dynamically updating the inverse noise parameters in conjunction with the model parameters, facilitating task-specific noise adaptation throughout the learning process. We employ proximal stochastic gradient descent (proximal SGD) to ensure that updates are bounded within a feasible range to ensure stability. This approach allows the model to converge efficiently to a stationary point, balancing the trade-off between noise mitigation and computational overhead, resulting in a highly adaptable quantum model that performs robustly in noisy quantum environments. Our framework is well-suited to near-term quantum devices in the noisy intermediate-scale quantum (NISQ) era, where noise is a significant challenge.

PDF Details

ICML Conference 2024 Conference Paper

A Theory of Fault-Tolerant Learning

Changlong Wu
Yifan Wang
Ananth Grama

Developing machine learning models that account for potential faults encountered in real-world environments presents a fundamental challenge for mission-critical applications. In this paper, we introduce a novel theoretical framework grounded in learning theory for dealing with faults. In particular, we propose a framework called fault-tolerant PAC learning, aimed at identifying the most fault-tolerant models from a given hypothesis class (such as neural networks). We show that if faults occur randomly, fault-tolerant learning is equivalent to regular PAC learning. However, for adversarial faults, we show that the sample complexity of fault-tolerant PAC learning can grow linearly w. r. t. the number of perturbing functions induced by the faults, even for a hypothesis class with VC-dimension 1. We then provide a matching upper bound by restricting the number of perturbing functions. Finally, we show that the linear dependency on the number of perturbing functions can be substantially improved for deletion faults in neural networks. Our work provides a powerful formal framework and avenues for a number of future investigations on the precise characterization of fault-tolerant learning.

Details

NeurIPS Conference 2024 Conference Paper

Information-theoretic Limits of Online Classification with Noisy Labels

Changlong Wu
Ananth Grama
Wojciech Szpankowski

We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization of noisy online classification with guarantees that apply to the ground truth while addressing general noisy observations.

PDF Details DOI

TMLR Journal 2023 Journal Article

Expected Worst Case Regret via Stochastic Sequential Covering

Changlong Wu
Mohsen Heidari
Ananth Grama
Wojciech Szpankowski

We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. In an online learning setting, Nature selects features and associates a true label with these features. A learner uses features to predict a label, which is compared to the true label, and a loss is incurred. The total loss over $T$ rounds, when compared to a loss incurred by a set of experts, is known as a regret. We introduce the notion of *expected worst case minimax regret* that generalizes and encompasses prior known minimax regrets. For such minimax regrets, we establish tight upper bounds via a novel concept of *stochastic global sequential covering*. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features over $T$ rounds, the cardinality of stochastic global sequential covering can be upper bounded with high probability (w.h.p.) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the *Star-Littlestone* dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools for the fixed design minimax regrets, we provide lower bounds for expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.

PDF Details

ICML Conference 2023 Conference Paper

Learning Functional Distributions with Private Labels

Changlong Wu
Yifan Wang
Ananth Grama
Wojciech Szpankowski

We study the problem of learning functional distributions in the presence of noise. A functional is a map from the space of features to distributions over a set of labels, and is often assumed to belong to a known class of hypotheses $\mathcal{F}$. Features are generated by a general random process and labels are sampled independently from feature-dependent distributions. In privacy sensitive applications, labels are passed through a noisy kernel. We consider online learning, where at each time step, a predictor attempts to predict the actual (label) distribution given only the features and noisy labels in prior steps. The performance of the predictor is measured by the expected KL-risk that compares the predicted distributions to the underlying truth. We show that the minimax expected KL-risk is of order $\tilde{\Theta}(\sqrt{T\log|\mathcal{F}|})$ for finite hypothesis class $\mathcal{F}$ and any non-trivial noise level. We then extend this result to general infinite classes via the concept of stochastic sequential covering and provide matching lower and upper bounds for a wide range of natural classes.

Details

NeurIPS Conference 2022 Conference Paper

Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm

Changlong Wu
Mohsen Heidari
Ananth Grama
Wojciech Szpankowski

We study sequential general online regression, known also as sequential probability assignments, under logarithmic loss when compared against a broad class of experts. We obtain tight, often matching, lower and upper bounds for sequential minimax regret, which is defined as the excess loss incurred by the predictor over the best expert in the class. After proving a general upper bound we consider some specific classes of experts from Lipschitz class to bounded Hessian class and derive matching lower and upper bounds with provably optimal constants. Our bounds work for a wide range of values of the data dimension and the number of rounds. To derive lower bounds, we use tools from information theory (e. g. , Shtarkov sum) and for upper bounds, we resort to new "smooth truncated covering" of the class of experts. This allows us to find constructive proofs by applying a simple and novel truncated Bayesian algorithm. Our proofs are substantially simpler than the existing ones and yet provide tighter (and often optimal) bounds.

PDF Details

AAAI Conference 2022 Conference Paper

Toward Physically Realizable Quantum Neural Networks

Mohsen Heidari
Ananth Grama
Wojciech Szpankowski

There has been significant recent interest in quantum neural networks (QNNs), along with their applications in diverse domains. Current solutions for QNNs pose significant challenges concerning their scalability, ensuring that the postulates of quantum mechanics are satisfied and that the networks are physically realizable. The exponential state space of QNNs poses challenges for the scalability of training procedures. The no-cloning principle prohibits making multiple copies of training samples, and the measurement postulates lead to non-deterministic loss functions. Consequently, the physical realizability and efficiency of existing approaches that rely on repeated measurement of several copies of each sample for training QNNs are unclear. This paper presents a new model for QNNs that relies on band-limited Fourier expansions of transfer functions of quantum perceptrons (QPs) to design scalable training procedures. This training procedure is augmented with a randomized quantum stochastic gradient descent technique that eliminates the need for sample replication. We show that this training procedure converges to the true minima in expectation, even in the presence of non-determinism due to quantum measurement. Our solution has a number of important benefits: (i) using QPs with concentrated Fourier power spectrum, we show that the training procedure for QNNs can be made scalable; (ii) it eliminates the need for resampling, thus staying consistent with the nocloning rule; and (iii) enhanced data efficiency for the overall training process since each data sample is processed once per epoch. We present a detailed theoretical foundation for our models and methods’ scalability, accuracy, and data efficiency. We also validate the utility of our approach through a series of numerical experiments.

PDF Details

IJCAI Conference 2020 Conference Paper

Characterizing Similarity of Visual Stimulus from Associated Neuronal Response

Vikram Ravindra
Ananth Grama

The problem of characterizing brain functions such as memory, perception, and processing of stimuli has received significant attention in neuroscience literature. These experiments rely on carefully calibrated, albeit complex inputs, to record brain response to signals. A major problem in analyzing brain response to common stimuli such as audio-visual input from videos (e. g. , movies) or story narration through audio books, is that observed neuronal responses are due to combinations of ``pure'' factors, many of which may be latent. In this paper, we present a novel methodological framework for deconvolving the brain's response to mixed stimuli into its constituent responses to underlying pure factors. This framework, based on archetypal analysis, is applied to the analysis of imaging data from an adult cohort watching the BBC show, Sherlock. By focusing on visual stimulus, we show strong correlation between our observed deconvolved response and third-party textual video annotations -- demonstrating the significant power of our analyses techniques. Building on these results, we show that our techniques can be used to predict neuronal responses in new subjects (how other individuals react to Sherlock), as well as to new visual content (how individuals react to other videos with known annotations). This paper reports on the first study that relates video features with neuronal responses in a rigorous algorithmic and statistical framework based on deconvolution of observed mixed imaging signals using archetypal analysis.

PDF Details DOI