Author name cluster

Yuheng Bu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

TrustEnergy: A Unified Framework for Accurate and Reliable User-level Energy Usage Prediction

Dahai Yu
Rongchao Xu
Dingyi Zhuang
Yuheng Bu
Shenhao Wang
Guang Wang

Energy usage prediction is important for various real-world applications, including grid management, infrastructure planning, and disaster response. Although a plethora of deep learning approaches have been proposed to perform this task, most of them either overlook the essential spatial correlations across households or fail to scale to individualized prediction, making them less effective for accurate fine-grained user-level prediction. In addition, due to the dynamic and uncertain nature of energy usage caused by various factors such as extreme weather events, quantifying uncertainty for reliable prediction is also significant, but it has not been fully explored in existing work. In this paper, we propose a unified framework called TrustEnergy for accurate and reliable user-level energy usage prediction. There are two key technical components in TrustEnergy, (i) a Hierarchical Spatiotemporal Representation module to efficiently capture both macro and micro energy usage patterns with a novel memory-augmented spatiotemporal graph neural network, and (ii) an innovative Sequential Conformalized Quantile Regression module to dynamically adjust uncertainty bounds to ensure valid prediction intervals over time, without making strong assumptions about the underlying data distribution. We implement and evaluate our TrustEnergy framework by working with an electricity provider in Florida, and the results show our TrustEnergy can achieve a 5.4% increase in prediction accuracy and 5.7% improvement in uncertainty quantification compared to state-of-the-art baselines.

PDF Details DOI

TMLR Journal 2025 Journal Article

Class-wise Generalization Error: an Information-Theoretic analysis

Firas Laakom
Moncef Gabbouj
Jürgen Schmidhuber
Yuheng Bu

Existing generalization theories for supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all different classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of the model for each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using recent advances in conditional mutual information bound, which enables practical evaluation. We empirically validate our proposed bounds in various neural networks and show that they accurately capture the complex class-generalization behavior. Moreover, we demonstrate that the theoretical tools developed in this work can be applied in several other applications.

PDF Details

ICML Conference 2025 Conference Paper

Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective

Firas Laakom
Haobo Chen
Jürgen Schmidhuber
Yuheng Bu

Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron–Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.

Details

ICLR Conference 2025 Conference Paper

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

Yepeng Liu
Yiren Song
Hai Ci
Yu Zhang
Haofan Wang
Mike Zheng Shou
Yuheng Bu

Image watermark techniques provide an effective way to assert ownership, deter misuse, and trace content sources, which has become increasingly essential in the era of large generative models. A critical attribute of watermark techniques is their robustness against various manipulations. In this paper, we introduce a watermark removal approach capable of effectively nullifying state-of-the-art watermarking techniques. Our primary insight involves regenerating the watermarked image starting from a \textbf{clean Gaussian noise} via a controllable diffusion model, utilizing the extracted semantic and spatial features from the watermarked image. The semantic control adapter and the spatial control network are specifically trained to control the denoising process towards ensuring image quality and enhancing consistency between the cleaned image and the original watermarked image. To achieve a smooth trade-off between watermark removal performance and image consistency, we further propose an adjustable and controllable regeneration scheme. This scheme adds varying numbers of noise steps to the latent representation of the watermarked image, followed by a controlled denoising process starting from this noisy latent representation. As the number of noise steps increases, the latent representation progressively approaches clean Gaussian noise, facilitating the desired trade-off. We apply our watermark removal methods across various watermarking techniques, and the results demonstrate that our methods offer superior visual consistency/quality and enhanced watermark removal performance compared to existing regeneration approaches. Our code is available at \url{https://github.com/yepengliu/CtrlRegen}.

Details

NeurIPS Conference 2025 Conference Paper

Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach

Haiyun He
Yepeng Liu
Ziqiao Wang
Yongyi Mao
Yuheng Bu

Watermarking has emerged as a crucial method to distinguish AI-generated text from human-created text. Current watermarking approaches often lack formal optimality guarantees or address the scheme and detector design separately. In this paper, we introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs) that jointly optimizes both the watermarking scheme and detector. Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality. We derive closed-form optimal solutions for this joint design and characterize the fundamental trade-off between watermark detectability and distortion. Notably, we reveal that the optimal watermarking schemes should be adaptive to the LLM’s generative distribution. Building on our theoretical insights, we propose a distortion-free, distribution-adaptive watermarking algorithm (DAWA) that leverages a surrogate model for model-agnosticism and efficiency. Experiments on Llama2-13B and Mistral-8$\times$7B models confirm the effectiveness of our approach, particularly at ultra-low FPRs. Our code is available at \url{https: //github. com/yepengliu/DAWA}.

PDF Details

ICML Conference 2024 Conference Paper

Adaptive Text Watermark for Large Language Models

Yepeng Liu
Yuheng Bu

The advancement of Large Language Models (LLMs) has led to increasing concerns about the misuse of AI-generated text, and watermarking LLM-generated text has emerged as a potential solution. However, it is challenging to generate high-quality watermarked text while maintaining robustness, security, and the ability to detect watermarks without prior knowledge of the prompt and model. This paper proposes an adaptive text watermarking strategy to address such a challenge. To improve the text quality and maintain robustness, we adaptively add watermarking to token distributions with high entropy measured by an auxiliary model and keep the low-entropy token distributions untouched. For the sake of security and to further minimize the watermark’s impact on text quality, instead of using a fixed green/red list generated from a random secret key, which can be vulnerable to decryption and forgery, we adaptively scale up the output logits based on the semantic embedding of previously generated text using a well designed semantic mapping model. Our experiments involving various LLMs demonstrate that our approach achieves comparable robustness performance to existing watermark methods. Additionally, the text generated by our method has perplexity comparable to that of un-watermarked LLMs while maintaining sufficient security.

Details

NeurIPS Conference 2024 Conference Paper

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Maohao Shen
J. J. Ryu
Soumya Ghosh
Yuheng Bu
Prasanna Sattigeri
Subhro Das
Gregory W. Wornell

This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called evidential deep learning (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e. g. , in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Chongyang Shi
Yuheng Bu
Jie Fu

The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered---Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example.

PDF Details DOI

ICML Conference 2024 Conference Paper

Operator SVD with Neural Networks via Nested Low-Rank Approximation

Jongha Jon Ryu
Xiangxiang Xu 0001
Hasan Sabri Melihcan Erol
Yuheng Bu
Lizhong Zheng
Gregory W. Wornell

Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific simulation problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning.

Details

ICML Conference 2023 Conference Paper

On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation

Maohao Shen
Yuheng Bu
Gregory W. Wornell

Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving the pseudo-labeling techniques or proposing new training objectives. Instead, we aim to analyze the fundamental limits of MSFDA. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model, which illustrates an inherent bias-variance trade-off. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment, which leads to the design of novel algorithms. Experiments on multiple datasets validate our theoretical analysis and demonstrate the state-of-art performance of the proposed algorithm, especially on some of the most challenging datasets, including Office-Home and DomainNet.

Details

AAAI Conference 2023 Conference Paper

Post-hoc Uncertainty Learning Using a Dirichlet Meta-Model

Maohao Shen
Yuheng Bu
Prasanna Sattigeri
Soumya Ghosh
Subhro Das
Gregory Wornell

It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. However, training the model from scratch is computationally expensive, and a trade-off might exist between prediction accuracy and uncertainty quantification. To this end, we consider a more practical post-hoc uncertainty learning setting, where a well-trained base model is given, and we focus on the uncertainty quantification task at the second stage of training. We propose a novel Bayesian uncertainty learning approach using the Dirichlet meta-model, which is effective and computationally efficient. Our proposed method requires no additional training data and is flexible enough to quantify different uncertainties and easily adapt to different application settings, including out-of-domain data detection, misclassification detection, and trustworthy transfer learning. Finally, we demonstrate our proposed meta-model approach's flexibility and superior empirical performance on these applications over multiple representative image classification benchmarks.

PDF Details DOI

ICML Conference 2022 Conference Paper

Selective Regression under Fairness Criteria

Abhin Shah
Yuheng Bu
Joshua K. Lee
Subhro Das
Rameswar Panda
Prasanna Sattigeri
Gregory W. Wornell

Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i. e. , by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, then the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.

Details

NeurIPS Conference 2021 Conference Paper

An Exact Characterization of the Generalization Error for the Gibbs Algorithm

Gholamali Aminian
Yuheng Bu
Laura Toni
Miguel Rodrigues
Gregory Wornell

Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm (a. k. a. Gibbs posterior) using symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.

PDF Details

ICML Conference 2021 Conference Paper

Fair Selective Classification Via Sufficiency

Joshua K. Lee
Yuheng Bu
Deepta Rajan
Prasanna Sattigeri
Rameswar Panda
Subhro Das
Gregory W. Wornell

Selective classification is a powerful tool for decision-making in scenarios where mistakes are costly but abstentions are allowed. In general, by allowing a classifier to abstain, one can improve the performance of a model at the cost of reducing coverage and classifying fewer samples. However, recent work has shown, in some cases, that selective classification can magnify disparities between groups, and has illustrated this phenomenon on multiple real-world datasets. We prove that the sufficiency criterion can be used to mitigate these disparities by ensuring that selective classification increases performance on all groups, and introduce a method for mitigating the disparity in precision across the entire coverage scale based on this criterion. We then provide an upper bound on the conditional mutual information between the class label and sensitive attribute, conditioned on the learned features, which can be used as a regularizer to achieve fairer selective classification. The effectiveness of the method is demonstrated on the Adult, CelebA, Civil Comments, and CheXpert datasets.

Details

AAAI Conference 2020 Conference Paper

Information-Theoretic Understanding of Population Risk Improvement with Model Compression

Yuheng Bu
Weihao Gao
Shaofeng Zou
Venugopal Veeravalli

We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We ﬁrst prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overﬁtting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted K-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.

PDF Details