Arrow Research search

Author name cluster

Changlong Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

NeurIPS Conference 2025 Conference Paper

Agnostic Continuous-Time Online Learning

  • Pramith Devulapalli
  • Changlong Wu
  • Ananth Grama
  • Wojciech Szpankowski

We study agnostic online learning from continuous-time data streams, a setting that naturally arises in applications such as environmental monitoring, personalized recommendation, and high-frequency trading. Unlike classical discrete-time models, learners in this setting must interact with a continually evolving data stream while making queries and updating models only at sparse, strategically selected times. We develop a general theoretical framework for learning from both *oblivious* and *adaptive* data streams, which may be noisy and non-stationary. For oblivious streams, we present a black-box reduction to classical online learning that yields a regret bound of $T \cdot R(S)/S$ for any class with discrete-time regret $R(S)$, where $T$ is the time horizon and $S$ is the *query budget*. For adaptive streams, which can evolve in response to learner actions, we design a dynamic query strategy in conjunction with a novel importance weighting scheme that enables unbiased loss estimation. In particular, for hypothesis class $\mathcal{H}$ with a finite Littlestone dimension, we establish a tight regret bound of $\tilde{\Theta}(T \cdot \sqrt{\mathsf{Ldim}(\mathcal{H})/S})$ that holds in both settings. Our results provide the first *quantitative* characterization of agnostic learning in continuous-time online environments with limited interaction.

ICLR Conference 2025 Conference Paper

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

  • Changlong Wu
  • Ananth Grama
  • Wojciech Szpankowski

Generative models have shown impressive capabilities in synthesizing high-quality outputs across various domains. However, a persistent challenge is the occurrence of "hallucinations," where the model produces outputs that are not grounded in the underlying facts. While empirical strategies have been explored to mitigate this issue, a rigorous theoretical understanding remains elusive. In this paper, we develop a theoretical framework to analyze the *learnability* of non-hallucinating generative models from a learning-theoretic perspective. Our results reveal that non-hallucinating learning is statistically *impossible* when relying solely on the training dataset, even for a hypothesis class of size two and when the entire training set is truthful. To overcome these limitations, we show that incorporating *inductive biases* aligned with the actual facts into the learning process is essential. We provide a systematic approach to achieve this by restricting the fact set to a concept class of finite VC-dimension and demonstrate its effectiveness under various learning paradigms. Although our findings are primarily conceptual, they represent a first step towards a principled approach to addressing hallucinations in learning generative models.

ICML Conference 2024 Conference Paper

A Theory of Fault-Tolerant Learning

  • Changlong Wu
  • Yifan Wang
  • Ananth Grama

Developing machine learning models that account for potential faults encountered in real-world environments presents a fundamental challenge for mission-critical applications. In this paper, we introduce a novel theoretical framework grounded in learning theory for dealing with faults. In particular, we propose a framework called fault-tolerant PAC learning, aimed at identifying the most fault-tolerant models from a given hypothesis class (such as neural networks). We show that if faults occur randomly, fault-tolerant learning is equivalent to regular PAC learning. However, for adversarial faults, we show that the sample complexity of fault-tolerant PAC learning can grow linearly w. r. t. the number of perturbing functions induced by the faults, even for a hypothesis class with VC-dimension 1. We then provide a matching upper bound by restricting the number of perturbing functions. Finally, we show that the linear dependency on the number of perturbing functions can be substantially improved for deletion faults in neural networks. Our work provides a powerful formal framework and avenues for a number of future investigations on the precise characterization of fault-tolerant learning.

NeurIPS Conference 2024 Conference Paper

Information-theoretic Limits of Online Classification with Noisy Labels

  • Changlong Wu
  • Ananth Grama
  • Wojciech Szpankowski

We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization of noisy online classification with guarantees that apply to the ground truth while addressing general noisy observations.

TMLR Journal 2023 Journal Article

Expected Worst Case Regret via Stochastic Sequential Covering

  • Changlong Wu
  • Mohsen Heidari
  • Ananth Grama
  • Wojciech Szpankowski

We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. In an online learning setting, Nature selects features and associates a true label with these features. A learner uses features to predict a label, which is compared to the true label, and a loss is incurred. The total loss over $T$ rounds, when compared to a loss incurred by a set of experts, is known as a regret. We introduce the notion of *expected worst case minimax regret* that generalizes and encompasses prior known minimax regrets. For such minimax regrets, we establish tight upper bounds via a novel concept of *stochastic global sequential covering*. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features over $T$ rounds, the cardinality of stochastic global sequential covering can be upper bounded with high probability (w.h.p.) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the *Star-Littlestone* dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools for the fixed design minimax regrets, we provide lower bounds for expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.

ICML Conference 2023 Conference Paper

Learning Functional Distributions with Private Labels

  • Changlong Wu
  • Yifan Wang
  • Ananth Grama
  • Wojciech Szpankowski

We study the problem of learning functional distributions in the presence of noise. A functional is a map from the space of features to distributions over a set of labels, and is often assumed to belong to a known class of hypotheses $\mathcal{F}$. Features are generated by a general random process and labels are sampled independently from feature-dependent distributions. In privacy sensitive applications, labels are passed through a noisy kernel. We consider online learning, where at each time step, a predictor attempts to predict the actual (label) distribution given only the features and noisy labels in prior steps. The performance of the predictor is measured by the expected KL-risk that compares the predicted distributions to the underlying truth. We show that the minimax expected KL-risk is of order $\tilde{\Theta}(\sqrt{T\log|\mathcal{F}|})$ for finite hypothesis class $\mathcal{F}$ and any non-trivial noise level. We then extend this result to general infinite classes via the concept of stochastic sequential covering and provide matching lower and upper bounds for a wide range of natural classes.

NeurIPS Conference 2022 Conference Paper

Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm

  • Changlong Wu
  • Mohsen Heidari
  • Ananth Grama
  • Wojciech Szpankowski

We study sequential general online regression, known also as sequential probability assignments, under logarithmic loss when compared against a broad class of experts. We obtain tight, often matching, lower and upper bounds for sequential minimax regret, which is defined as the excess loss incurred by the predictor over the best expert in the class. After proving a general upper bound we consider some specific classes of experts from Lipschitz class to bounded Hessian class and derive matching lower and upper bounds with provably optimal constants. Our bounds work for a wide range of values of the data dimension and the number of rounds. To derive lower bounds, we use tools from information theory (e. g. , Shtarkov sum) and for upper bounds, we resort to new "smooth truncated covering" of the class of experts. This allows us to find constructive proofs by applying a simple and novel truncated Bayesian algorithm. Our proofs are substantially simpler than the existing ones and yet provide tighter (and often optimal) bounds.