Arrow Research search

Author name cluster

Guanghui Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

JBHI Journal 2026 Journal Article

T-CACE: A Time-Conditioned Autoregressive Contrast Enhancement Multi-Task Framework for Contrast-Free Liver MRI Synthesis, Segmentation, and Diagnosis

  • Xiaojiao Xiao
  • Jianfeng Zhao
  • Qinmin Vivian Hu
  • Guanghui Wang

Magnetic resonance imaging (MRI) is a leading modality for the diagnosis of liver cancer, significantly improving the classification of the lesion and patient outcomes. However, traditional MRI faces challenges including risks from contrast agent (CA) administration, time-consuming manual assessment, and limited annotated datasets. To address these limitations, we propose a Time-Conditioned Autoregressive Contrast Enhancement (T-CACE) framework for synthesizing multi-phase contrast-enhanced MRI (CEMRI) directly from non-contrast MRI (NCMRI). T-CACE introduces three core innovations: a conditional token encoding (CTE) mechanism that unifies anatomical priors and temporal phase information into latent representations; and a dynamic time-aware attention mask (DTAM) that adaptively modulates inter-phase information flow using a Gaussian-decayed attention mechanism, ensuring smooth and physiologically plausible transitions across phases. Furthermore, a constraint for temporal classification consistency (TCC) aligns the lesion classification output with the evolution of the physiological signal, further enhancing diagnostic reliability. Extensive experiments on two independent liver MRI datasets demonstrate that T-CACE outperforms state-of-the-art methods in image synthesis, segmentation, and lesion classification. This framework offers a clinically relevant and efficient alternative to traditional contrast-enhanced imaging, improving safety, diagnostic efficiency, and reliability for the assessment of liver lesion.

ICML Conference 2025 Conference Paper

Autoencoder-Based Hybrid Replay for Class-Incremental Learning

  • Milad Khademi Nori
  • Il-Min Kim 0001
  • Guanghui Wang

In class-incremental learning (CIL), effective incremental learning strategies are essential to mitigate task confusion and catastrophic forgetting, especially as the number of tasks $t$ increases. Current exemplar replay strategies impose $\mathcal{O}(t)$ memory/compute complexities. We propose an autoencoder-based hybrid replay (AHR) strategy that leverages our new hybrid autoencoder (HAE) to function as a compressor to alleviate the requirement for large memory, achieving $\mathcal{O}(0. 1 t)$ at the worst case with the computing complexity of $\mathcal{O}(t)$ while accomplishing state-of-the-art performance. The decoder later recovers the exemplar data stored in the latent space, rather than in raw format. Additionally, HAE is designed for both discriminative and generative modeling, enabling classification and replay capabilities, respectively. HAE adopts the charged particle system energy minimization equations and repulsive force algorithm for the incremental embedding and distribution of new class centroids in its latent space. Our results demonstrate that AHR consistently outperforms recent baselines across multiple benchmarks while operating with the same memory/compute budgets. The source code is included in the supplementary material and will be open-sourced upon publication.

JBHI Journal 2025 Journal Article

Develop a Deep-Learning Model to Predict Cancer Immunotherapy Response Using In-Born Genomes

  • Kai Yan
  • Zhiheng Zhou
  • Sihao Liu
  • Guanghui Wang
  • Guiying Yan
  • Edwin Wang

The emergence of immune checkpoint inhibitors (ICIs) has significantly advanced cancer treatment. However, only 15-30% of the cancer patients respond to ICI treatment, which stimulates and enhances host immunity to eliminate tumor cells. ICI treatment is very expensive and has potential adverse reactions; therefore, it is crucial to develop a method which enables to accurately and rapidly assess a patient's suitability before ICI treatment. We complied germline whole-genome sequencing (WES) data of 37 melanoma patients who have been treated with ICIs and sequenced in our lab previously, and the WES data of other 700 ICI-treated cancer patients in public domain. Using these data, we proposed a novel double-channel attention neural network (DANN) model to predict cancer ICI-response and validate the predictions. DANN achieved a mean accuracy and AUC of 0. 95 and 0. 98, respectively, which outperformed traditional machine learning methods. Enrichment analysis of the DANN-identified genes indicated that cancer patients whose in-born genomic variants might mainly affect host immune system in a wide-ranging manner, and then affect ICI response. Finally, we found a set of 12 genes bearing genomic variants were significantly associated with cancer patient survivals after ICI treatment.

ICLR Conference 2025 Conference Paper

Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting

  • Milad Khademi Nori
  • Il-Min Kim 0001
  • Guanghui Wang

Federated Class-Incremental Learning (FCIL) refers to a scenario where a dynamically changing number of clients collaboratively learn an ever-increasing number of incoming tasks. FCIL is known to suffer from local forgetting due to class imbalance at each client and global forgetting due to class imbalance across clients. We develop a mathematical framework for FCIL that formulates local and global forgetting. Then, we propose an approach called Hybrid Rehearsal (HR), which utilizes latent exemplars and data-free techniques to address local and global forgetting, respectively. HR employs a customized autoencoder designed for both data classification and the generation of synthetic data. To determine the embeddings of new tasks for all clients in the latent space of the encoder, the server uses the Lennard-Jones Potential formulations. Meanwhile, at the clients, the decoder decodes the stored low-dimensional latent space exemplars back to the high-dimensional input space, used to address local forgetting. To overcome global forgetting, the decoder generates synthetic data. Furthermore, our mathematical framework proves that our proposed approach HR can, in principle, tackle the two local and global forgetting challenges. In practice, extensive experiments demonstrate that while preserving privacy, our proposed approach outperforms the state-of-the-art baselines on multiple FCIL benchmarks with low compute and memory footprints.

IJCAI Conference 2025 Conference Paper

Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation

  • Niaz Ahmad
  • Jawad Khan
  • Kang G. Shin
  • Youngmoon Lee
  • Guanghui Wang

The dynamic movement of the human body presents a fundamental challenge for human pose estimation and body segmentation. State-of-the-art approaches primarily rely on combining keypoint heatmaps with segmentation masks, but often struggle in scenarios involving overlapping joints during pose estimation or rapidly changing poses for instance-level segmentation. To address these limitations, we leverage Keypoints as Dynamic Centroid (KDC), a new centroid-based representation for unified human pose estimation and instance-level segmentation. KDC adopts a bottom-up paradigm to generate keypoint heatmaps for easily distinguishable and complex keypoints, and improves keypoint detection and confidence scores by introducing KeyCentroids using a keypoint disk. It leverages high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids, allowing for the swift clustering of pixels to specific human instances during rapid changes in human body movements in a live environment. Our experimental evaluations focus on crowded and occluded cases using the CrowdPose, OCHuman, and COCO benchmarks, demonstrating KDC’s effectiveness and generalizability in challenging scenarios in terms of both accuracy and runtime performance. Our implementation is available at https: //sites. google. com/view/niazahmad/projects/kdc.

ICML Conference 2025 Conference Paper

Learning Imbalanced Data with Beneficial Label Noise

  • Guangzheng Hu
  • Feng Liu 0003
  • Mingming Gong
  • Guanghui Wang
  • Liuhua Peng

Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.

JMLR Journal 2025 Journal Article

Reliever: Relieving the Burden of Costly Model Fits for Changepoint Detection

  • Chengde Qian
  • Guanghui Wang
  • Changliang Zou

Changepoint detection typically relies on a grid-search strategy for optimal data segmentation. When model fitting itself is expensive, repeatedly fitting a model on every candidate segment dominates the computation. Existing approaches mitigate this by pruning the grid, thus reducing the number of segments (and model fits). We propose Reliever, which instead cuts the number of model fits directly and nests seamlessly within standard grid-search routines. Reliever fits a small, deterministic collection of proxy models and reuses them wherever they apply, making it compatible with a wide range of existing algorithms. For high-dimensional regression with changepoints, coupling Reliever with an optimal grid-search method yields changepoint and coefficient estimators that are rate-optimal up to a logarithmic factor. Extensive numerical experiments demonstrate that Reliever rapidly and accurately detects changepoints across a wide range of high-dimensional and nonparametric models. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

JMLR Journal 2025 Journal Article

Universal Online Convex Optimization Meets Second-order Bounds

  • Lijun Zhang
  • Yibo Wang
  • Guanghui Wang
  • Jinfeng Yi
  • Tianbao Yang

Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, making it difficult to exploit the structure of the problem and utilize existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses to aggregate predictions from experts. Specifically, the meta-algorithm is required to yield a second-order bound with excess losses, so that it can leverage strong convexity and exponential concavity to control the meta-regret. In this way, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. As a result, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound. Furthermore, we extend our universal strategy to online composite optimization, where the loss function comprises a time-varying function and a fixed regularizer. To deal with the composite loss functions, we employ a meta-algorithm based on the optimistic online learning framework, which not only enjoys a second-order bound, but also can utilize estimations for upcoming loss functions. With suitable configurations, we show that the additional regularizer does not contribute to the meta-regret, thus ensuring the universality in the composite setting. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

JMLR Journal 2024 Journal Article

Low-Rank Matrix Estimation in the Presence of Change-Points

  • Lei Shi
  • Guanghui Wang
  • Changliang Zou

We consider a general trace regression model with multiple structural changes and propose a universal approach for simultaneous exact or near-low-rank matrix recovery and change-point detection. It incorporates nuclear norm penalized least-squares minimization into a grid search scheme that determines the potential structural break. Under a set of general conditions, we establish the non-asymptotic error bounds with a nearly-oracle rate for the matrix estimators as well as the super-consistency rate for the change-point localization. We use concrete random design instances to justify the appropriateness of the proposed conditions. Numerical results demonstrate the validity and effectiveness of the proposed scheme. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Uncertainty Quantification for Data-Driven Change-Point Learning via Cross-Validation

  • Hui Chen
  • Yinxu Jia
  • Guanghui Wang
  • Changliang Zou

Accurately detecting multiple change-points is critical for various applications, but determining the optimal number of change-points remains a challenge. Existing approaches based on information criteria attempt to balance goodness-of-fit and model complexity, but their performance varies depending on the model. Recently, data-driven selection criteria based on cross-validation has been proposed, but these methods can be prone to slight overfitting in finite samples. In this paper, we introduce a method that controls the probability of overestimation and provides uncertainty quantification for learning multiple change-points via cross-validation. We frame this problem as a sequence of model comparison problems and leverage high-dimensional inferential procedures. We demonstrate the effectiveness of our approach through experiments on finite-sample data, showing superior uncertainty quantification for overestimation compared to existing methods. Our approach has broad applicability and can be used in diverse change-point models.

NeurIPS Conference 2024 Conference Paper

Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference

  • Geng Chen
  • Yinxu Jia
  • Guanghui Wang
  • Changliang Zou

The widespread use of black box prediction methods has sparked an increasing interest in algorithm/model-agnostic approaches for quantifying goodness-of-fit, with direct ties to specification testing, model selection and variable importance assessment. A commonly used framework involves defining a predictiveness criterion, applying a cross-fitting procedure to estimate the predictiveness, and utilizing the difference in estimated predictiveness between two models as the test statistic. However, even after standardization, the test statistic typically fails to converge to a non-degenerate distribution under the null hypothesis of equal goodness, leading to what is known as the degeneracy issue. To addresses this degeneracy issue, we present a simple yet effective device, Zipper. It draws inspiration from the strategy of additional splitting of testing data, but encourages an overlap between two testing data splits in predictiveness evaluation. Zipper binds together the two overlapping splits using a slider parameter that controls the proportion of overlap. Our proposed test statistic follows an asymptotically normal distribution under the null hypothesis for any fixed slider value, guaranteeing valid size control while enhancing power by effective data reuse. Finite-sample experiments demonstrate that our procedure, with a simple choice of the slider, works well across a wide range of settings.

NeurIPS Conference 2023 Conference Paper

Faster Margin Maximization Rates for Generic Optimization Methods

  • Guanghui Wang
  • Zihao Hu
  • Vidya Muthukumar
  • Jacob D. Abernethy

First-order optimization methods tend to inherently favor certain solutions over others when minimizing a given training objective with multiple local optima. This phenomenon, known as \emph{implicit bias}, plays a critical role in understanding the generalization capabilities of optimization algorithms. Recent research has revealed that gradient-descent-based methods exhibit an implicit bias for the $\ell_2$-maximal margin classifier in the context of separable binary classification. In contrast, generic optimization methods, such as mirror descent and steepest descent, have been shown to converge to maximal margin classifiers defined by alternative geometries. However, while gradient-descent-based algorithms demonstrate fast implicit bias rates, the implicit bias rates of generic optimization methods have been relatively slow. To address this limitation, in this paper, we present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our primary technique involves transforming a generic optimization algorithm into an online learning dynamic that solves a regularized bilinear game, providing a unified framework for analyzing the implicit bias of various optimization methods. The accelerated rates are derived leveraging the regret bounds of online learning algorithms within this game framework.

ICLR Conference 2023 Conference Paper

On Accelerated Perceptrons and Beyond

  • Guanghui Wang
  • Rafael Hanashiro
  • Etash Kumar Guha
  • Jacob D. Abernethy

The classical Perceptron algorithm of Rosenblatt can be used to find a linear threshold function to correctly classify $n$ linearly separable data points, assuming the classes are separated by some margin $\gamma > 0$. A foundational result is that Perceptron converges after $\Omega(1/\gamma^{2})$ iterations. There have been several recent works that managed to improve this rate by a quadratic factor, to $\Omega(\sqrt{\log n}/\gamma)$, with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through \emph{optimistic} online learning. We then show that the proposed framework also leads to improved results for a series of problems beyond the standard Perceptron setting. Specifically, a) for the margin maximization problem, we improve the state-of-the-art result from $O(\log t/t^2)$ to $O(1/t^2)$, where $t$ is the number of iterations; b) we provide the first result on identifying the implicit bias property of the classical Nesterov's accelerated gradient descent (NAG) algorithm, and show NAG can maximize the margin with an $O(1/t^2)$ rate; c) for the classical $p$-norm Perceptron problem, we provide an algorithm with $\Omega(\sqrt{(p-1)\log n}/\gamma)$ convergence rate, while existing algorithms suffer the $\Omega({(p-1)}/\gamma^2)$ convergence rate.

NeurIPS Conference 2023 Conference Paper

Riemannian Projection-free Online Learning

  • Zihao Hu
  • Guanghui Wang
  • Jacob D. Abernethy

The projection operation is a critical component in a wide range of optimization algorithms, such as online gradient descent (OGD), for enforcing constraints and achieving optimal regret bounds. However, it suffers from computational complexity limitations in high-dimensional settings or when dealing with ill-conditioned constraint sets. Projection-free algorithms address this issue by replacing the projection oracle with more efficient optimization subroutines. But to date, these methods have been developed primarily in the Euclidean setting, and while there has been growing interest in optimization on Riemannian manifolds, there has been essentially no work in trying to utilize projection-free tools here. An apparent issue is that non-trivial affine functions are generally non-convex in such domains. In this paper, we present methods for obtaining sub-linear regret guarantees in online geodesically convex optimization on curved spaces for two scenarios: when we have access to (a) a separation oracle or (b) a linear optimization oracle. For geodesically convex losses, and when a separation oracle is available, our algorithms achieve $O(T^{\frac{1}{2}})$, $O(T^{\frac{3}{4}})$ and $O(T^{\frac{1}{2}})$ adaptive regret guarantees in the full information setting, the bandit setting with one-point feedback and the bandit setting with two-point feedback, respectively. When a linear optimization oracle is available, we obtain regret rates of $O(T^{\frac{3}{4}})$ for geodesically convex losses and $O(T^{\frac{2}{3}}\log T)$ for strongly geodesically convex losses.

NeurIPS Conference 2022 Conference Paper

Adaptive Oracle-Efficient Online Learning

  • Guanghui Wang
  • Zihao Hu
  • Vidya Muthukumar
  • Jacob D. Abernethy

The classical algorithms for online learning and decision-making have the benefit of achieving the optimal performance guarantees, but suffer from computational complexity limitations when implemented at scale. More recent sophisticated techniques, which we refer to as $\textit{oracle-efficient}$ methods, address this problem by dispatching to an $\textit{offline optimization oracle}$ that can search through an exponentially-large (or even infinite) space of decisions and select that which performed the best on any dataset. But despite the benefits of computational feasibility, most oracle-efficient algorithms exhibit one major limitation: while performing well in worst-case settings, they do not adapt well to friendly environments. In this paper we consider two such friendly scenarios, (a) "small-loss" problems and (b) IID data. We provide a new framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to the small-loss environment, under a particular condition which we call $\textit{approximability}$ (which is spiritually related to sufficient conditions provided in (Dudík et al. , 2020)). We identify a series of real-world settings, including online auctions and transductive online classification, for which approximability holds. We also extend the algorithm to an IID data setting and establish a "best-of-both-worlds" bound in the oracle-efficient setting.

JMLR Journal 2022 Journal Article

Projection-free Distributed Online Learning with Sublinear Communication Complexity

  • Yuanyu Wan
  • Guanghui Wang
  • Wei-Wei Tu
  • Lijun Zhang

To deal with complicated constraints via locally light computations in distributed online learning, a recent study has presented a projection-free algorithm called distributed online conditional gradient (D-OCG), and achieved an $O(T^{3/4})$ regret bound for convex losses, where $T$ is the number of total rounds. However, it requires $T$ communication rounds, and cannot utilize the strong convexity of losses. In this paper, we propose an improved variant of D-OCG, namely D-BOCG, which can attain the same $O(T^{3/4})$ regret bound with only $O(\sqrt{T})$ communication rounds for convex losses, and a better regret bound of $O(T^{2/3}(\log T)^{1/3})$ with fewer $O(T^{1/3}(\log T)^{2/3})$ communication rounds for strongly convex losses. The key idea is to adopt a delayed update mechanism that reduces the communication complexity, and redefine the surrogate loss function in D-OCG for exploiting the strong convexity. Furthermore, we provide lower bounds to demonstrate that the $O(\sqrt{T})$ communication rounds required by D-BOCG are optimal (in terms of $T$) for achieving the $O(T^{3/4})$ regret with convex losses, and the $O(T^{1/3}(\log T)^{2/3})$ communication rounds required by D-BOCG are near-optimal (in terms of $T$) for achieving the $O(T^{2/3}(\log T)^{1/3})$ regret with strongly convex losses up to polylogarithmic factors. Finally, to handle the more challenging bandit setting, in which only the loss value is available, we incorporate the classical one-point gradient estimator into D-BOCG, and obtain similar theoretical guarantees. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

JMLR Journal 2021 Journal Article

Bandit Convex Optimization in Non-stationary Environments

  • Peng Zhao
  • Guanghui Wang
  • Lijun Zhang
  • Zhi-Hua Zhou

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves $O(T^{3/4}(1+P_T)^{1/2})$ and $O(T^{1/2}(1+P_T)^{1/2})$ dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the $\Omega(T^{1/2}(1+P_T)^{1/2})$ lower bound established in this paper. Notably, our algorithm is adaptive to the non-stationary environments since it does not require prior knowledge of the path-length $P_T$ ahead of time, which is generally unknown. We further extend the algorithm to an anytime version that does not require to know the time horizon $T$ in advance. Moreover, we study the adaptive regret, another widely used performance measure for online learning in non-stationary environments, and design an algorithm that provably enjoys the adaptive regret guarantees for BCO problems. Finally, we present empirical studies to validate the effectiveness of the proposed approach. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

  • Lijun Zhang
  • Guanghui Wang
  • Wei-Wei Tu
  • Wei Jiang
  • Zhi-Hua Zhou

To deal with changing environments, a new performance measure—adaptive regret, defined as the maximum static regret over any interval, was proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.

NeurIPS Conference 2021 Conference Paper

Online Convex Optimization with Continuous Switching Constraint

  • Guanghui Wang
  • Yuanyu Wan
  • Tianbao Yang
  • Lijun Zhang

In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the \emph{overall} switching cost. We first investigate the hardness of the problem, and provide a lower bound of order $\Omega(\sqrt{T})$ when the switching cost budget $S=\Omega(\sqrt{T})$, and $\Omega(\min\{\frac{T}{S}, T\})$ when $S=O(\sqrt{T})$, where $T$ is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound. Finally, we show that, for strongly convex functions, the regret bound can be improved to $O(\log T)$ for $S=\Omega(\log T)$, and $O(\min\{T/\exp(S)+S, T\})$ for $S=O(\log T)$.

AAAI Conference 2021 Conference Paper

Stochastic Graphical Bandits with Adversarial Corruptions

  • Shiyin Lu
  • Guanghui Wang
  • Lijun Zhang

We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an O(α ln K ln T + αC) regret, where α is the independence number of the feedback graph, K is the number of arms, T is the time horizon, and C quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.

AAAI Conference 2020 Conference Paper

Adapting to Smoothness: A More Universal Algorithm for Online Convex Optimization

  • Guanghui Wang
  • Shiyin Lu
  • Yao Hu
  • Lijun Zhang

We aim to design universal algorithms for online convex optimization, which can handle multiple common types of loss functions simultaneously. The previous state-of-the-art universal method has achieved the minimax optimality for general convex, exponentially concave and strongly convex loss functions. However, it remains an open problem whether smoothness can be exploited to further improve the theoretical guarantees. In this paper, we provide an affirmative answer by developing a novel algorithm, namely UFO, which achieves O( √ L∗), O(d log L∗) and O(log L∗) regret bounds for the three types of loss functions respectively under the assumption of smoothness, where L∗ is the cumulative loss of the best comparator in hindsight, and d is dimensionality. Thus, our regret bounds are much tighter when the comparator has a small loss, and ensure the minimax optimality in the worst case. In addition, it is worth pointing out that UFO is the first to achieve the O(log L∗) regret bound for strongly convex and smooth functions, which is tighter than the existing small-loss bound by an O(d) factor.

IJCAI Conference 2020 Conference Paper

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

  • Bo Xue
  • Guanghui Wang
  • Yimu Wang
  • Lijun Zhang

In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+epsilon moments for some epsilon in (0, 1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of widetilde{O}(d^(1/2)T^(1/(1+epsilon))), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an Omega(d^(epsilon/(1+epsilon))T^(1/(1+epsilon))) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when epsilon=1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.

IJCAI Conference 2019 Conference Paper

Multi-Objective Generalized Linear Bandits

  • Shiyin Lu
  • Guanghui Wang
  • Yao Hu
  • Lijun Zhang

In this paper, we study the multi-objective bandits (MOB) problem, where a learner repeatedly selects one arm to play and then receives a reward vector consisting of multiple objectives. MOB has found many real-world applications as varied as online recommendation and network routing. On the other hand, these applications typically contain contextual information that can guide the learning process which, however, is ignored by most of existing work. To utilize this information, we associate each arm with a context vector and assume the reward follows the generalized linear model (GLM). We adopt the notion of Pareto regret to evaluate the learner's performance and develop a novel algorithm for minimizing it. The essential idea is to apply a variant of the online Newton step to estimate model parameters, based on which we utilize the upper confidence bound (UCB) policy to construct an approximation of the Pareto front, and then uniformly at random choose one arm from the approximate Pareto front. Theoretical analysis shows that the proposed algorithm achieves an \tilde O(d\sqrt{T}) Pareto regret, where T is the time horizon and d is the dimension of contexts, which matches the optimal result for single objective contextual bandits problem. Numerical experiments demonstrate the effectiveness of our method.

IJCAI Conference 2018 Conference Paper

Minimizing Adaptive Regret with One Gradient per Iteration

  • Guanghui Wang
  • Dakuan Zhao
  • Lijun Zhang

To cope with non-stationary environments, recent advances in online optimization have introduced the notion of adaptive regret, which measures the performance of an online learner against different comparators within different time intervals. Previous studies have proposed various algorithms to yield low adaptive regret under different scenarios. However, all of existing algorithms need to query the gradient of the loss function at least O(log t) times in every iteration t, which hinders their applications to broad domains, especially when the evaluation of gradients is expensive. To address this limitation, we propose a series of computationally efficient algorithms for minimizing the adaptive regret of general convex, strongly convex and exponentially concave functions respectively. The key idea is to replace each loss function with a carefully designed surrogate loss, which bounds the original loss function from below. We show that the proposed algorithms only query the gradient once per iteration, and attain the same theoretical guarantees as previous optimal algorithms. Empirical results demonstrate the efficiency and effectiveness of our methods.