Arrow Research search

Author name cluster

Peng Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers
1 author row

Possible papers

51

AAAI Conference 2026 Conference Paper

Efficient Few-Step Solution Generation via Discrete Flow Matching for Combinatorial Optimization

  • Yuanshu Li
  • Di Wang
  • Wei Du
  • Xuan Wu
  • Peng Zhao
  • Yubin Xiao
  • You Zhou

Combinatorial optimization problems (COPs) are fundamental to many real-world applications where efficiently producing high-quality solutions is critical. Recent advances in diffusion-based non-autoregressive models have reformulated solving COPs as a generative process, achieving promising results. However, almost all of these methods still suffer from accumulated errors and high inference costs due to the multi-step stochastic denoising process. To address these issues, we propose EFLOCO, an efficient discrete flow matching method for solving COPs, learning structured and deterministic solution trajectories. EFLOCO replaces noise-driven updates with smooth and guided transitions, thereby improves inference stability and quality. Furthermore, we introduce an adaptive time-step scheduler that makes more efforts in critical transition regions, yielding strong performance under few-step constraints. Experiments on standard Traveling Salesman Problems (TSPs) and Asymmetric TSPs (ATSPs) show that our method consistently outperforms both learning-based and heuristic baselines in terms of solution quality and inference speed.

EAAI Journal 2025 Journal Article

A neural network guided dual-space search evolutionary algorithm for large scale multi-objective optimization

  • Jie Cao
  • Chengzhi Liu
  • Zuohan Chen
  • Jianlin Zhang
  • Peng Zhao

The curse of dimensionality caused by the increase of decision variables in large-scale multi-objective problems (LSMOPs) is still the current challenge. Although existing algorithms can simple large-scale multi-objective optimization problems. Nevertheless, a single search strategy might impact the solution of large-scale multi-objective optimization problems. To solve this problem, a dual-space search evolutionary algorithm for large-scale multi-objective optimization is proposed. Firstly, in the decision space, a neural network assisted operator with adaptive strategy is introduced. Specifically, when the number of non-dominated solutions is decreasing, the neural network is adopted to optimize the solutions with poor fitness for breaking away from local optimality. After that, the objective space of population is divided into several sub-regions by k-means clustering strategy. The solutions in these subregions are mapped onto the decision space through the inverse model, so that population can obtain as many non-dominated solutions as possible. Finally, the proposed algorithm is tested on a real-life problem which is Time-varying Ratio Error Estimation (TREE) and two benchmark suites which are large-scale multi-objective optimization problem (LSMOP) and unconstrained front (UF). The results show that the proposed algorithm exhibits competitive performance compared to other state-of-the-art algorithms on Inverted Generational Distance (IGD) Indicator and Hyper-volume (HV) Indicator.

IJCAI Conference 2025 Conference Paper

DGL: Dynamic Global-Local Information Aggregation for Scalable VRP Generalization with Self-Improvement Learning

  • Yubin Xiao
  • Yuesong Wu
  • Rui Cao
  • Di Wang
  • Zhiguang Cao
  • Xuan Wu
  • Peng Zhao
  • Yuanshu Li

The Vehicle Routing Problem (VRP) is a critical combinatorial optimization problem with wide-reaching real-world applications, particularly in logistics, transportation. While neural network-based VRP solvers have shown impressive results on test instances similar to training data, their performance often degrades when faced with varying scales and unseen distributions, limiting their practical applicability. To overcome these limitations, we introduce DGL (Dynamic Global-Local Information Aggregation), a novel model that combines global and local information to effectively solve VRPs. DGL dynamically adjusts local node selections within a localized range, capturing local invariance across problems of different scales and distributions, thereby enhancing generalization. At the same time, DGL integrates global context into the decision-making process, providing richer information for more informed decisions. Additionally, we propose a replacement-based self-improvement learning framework that leverages data augmentation and random replacement techniques, further enhancing DGL's robustness. Extensive experiments on synthetic datasets, benchmark datasets, and real-world country map instances demonstrate that DGL achieves state-of-the-art performance, particularly in generalizing to large-scale VRPs and real-world scenarios. These results showcase DGL's effectiveness in solving complex, realistic optimization challenges and highlight its potential for practical applications.

JMLR Journal 2025 Journal Article

Efficient Methods for Non-stationary Online Learning

  • Peng Zhao
  • Yan-Feng Xie
  • Lijun Zhang
  • Zhi-Hua Zhou

Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of non-stationarity, in which multiple base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises concerns about computational complexity --- such methods typically maintain $O(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret that reduce the number of projections per round from $O(\log T)$ to $1$. The proposed algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial modifications for non-stationary online methods. Furthermore, we study an even stronger measure, namely "interval dynamic regret", and reduce the number of projections per round from $O(\log^2 T)$ to $1$ for minimizing it. Our reduction demonstrates broad generality and applies to two important applications: online stochastic control and online principal component analysis, resulting in methods that are both efficient and optimal. Finally, empirical studies verify our theoretical findings. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

JBHI Journal 2025 Journal Article

Few-Shot Class-Incremental Learning for Retinal Disease Recognition

  • Jinghua Zhang
  • Peng Zhao
  • Yongkun Zhao
  • Chen Li
  • Dewen Hu

Few-Shot Class-Incremental Learning (FSCIL) techniques are essential for developing Deep Learning (DL) models that can continuously learn new classes with limited samples while retaining existing knowledge. This capability is particularly crucial for DL-based retinal disease diagnosis system, where acquiring large annotated datasets is challenging, and disease phenotypes evolve over time. This paper introduces Re-FSCIL, a novel framework for Few-Shot Class-Incremental Retinal Disease Recognition (FSCIRDR). Re-FSCIL integrates the RETFound model with a fine-grained module, employing a forward-compatible training strategy to improve adaptability, supervised contrastive learning to enhance feature discrimination, and feature fusion for robust representation quality. We convert existing datasets into the FSCIL format and reproduce numerous representative FSCIL methods to create two new benchmarks, RFMiD38 and JSIEC39, specifically for FSCIRDR. Our experimental results demonstrate that Re-FSCIL achieves State-of-the-art (SOTA) performance, significantly surpassing existing FSCIL methods on these benchmarks.

NeurIPS Conference 2025 Conference Paper

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

  • Yu-Jie Zhang
  • Sheng-An Xu
  • Peng Zhao
  • Masashi Sugiyama

We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function, thereby modeling a broad class of reward distributions such as Bernoulli and Poisson. While GLBs are widely applicable to real-world scenarios, their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency. Existing methods typically trade off between two objectives, either incurring high per-round costs for optimal regret guarantees or compromising statistical efficiency to enable constant-time updates. In this paper, we propose a jointly efficient algorithm that attains a nearly optimal regret bound with $\mathcal{O}(1)$ time and space complexities per round. The core of our method is a tight confidence set for the online mirror descent (OMD) estimator, which is derived through a novel analysis that leverages the notion of mix loss from online prediction. The analysis shows that our OMD estimator, even with its one-pass updates, achieves statistical efficiency comparable to maximum likelihood estimation, thereby leading to a jointly efficient optimistic method.

NeurIPS Conference 2025 Conference Paper

Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness

  • Yuheng Zhao
  • Yu-Hu Yan
  • Kfir Y. Levy
  • Peng Zhao

Smoothness is known to be crucial for acceleration in offline optimization, and for gradient-variation regret minimization in online learning. Interestingly, these two problems are actually closely connected --- accelerated optimization can be understood through the lens of gradient-variation online learning. In this paper, we investigate online learning with Hölder functions, a general class encompassing both smooth and non-smooth (Lipschitz) functions, and explore its implications for offline optimization. For (strongly) convex online functions, we design the corresponding gradient-variation online learning algorithm whose regret smoothly interpolates between the optimal guarantees in smooth and non-smooth regimes. Notably, our algorithms do not require prior knowledge of the Hölder smoothness parameter, exhibiting strong adaptivity over existing methods. Through online-to-batch conversion, this gradient-variation online adaptivity yields an optimal universal method for stochastic convex optimization under Hölder smoothness. However, achieving universality in offline strongly convex optimization is more challenging. We address this by integrating online adaptivity with a detection-based guess-and-check procedure, which, for the first time, yields a universal offline method that achieves accelerated convergence in the smooth regime while maintaining near-optimal convergence in the non-smooth one.

NeurIPS Conference 2025 Conference Paper

Learning Memory-Enhanced Improvement Heuristics for Flexible Job Shop Scheduling

  • Jiaqi Wang
  • Zhiguang Cao
  • Peng Zhao
  • Rui Cao
  • Yubin Xiao
  • Yuan Jiang
  • You Zhou

The rise of smart manufacturing under Industry 4. 0 introduces mass customization and dynamic production, demanding more advanced and flexible scheduling techniques. The flexible job-shop scheduling problem (FJSP) has attracted significant attention due to its complex constraints and strong alignment with real-world production scenarios. Current deep reinforcement learning (DRL)-based approaches to FJSP predominantly employ constructive methods. While effective, they often fall short of reaching (near-)optimal solutions. In contrast, improvement-based methods iteratively explore the neighborhood of initial solutions and are more effective in approaching optimality. However, the flexible machine allocation in FJSP poses significant challenges to the application of this framework, including accurate state representation, effective policy learning, and efficient search strategies. To address these challenges, this paper proposes a $\textbf{M}$emory-enhanced $\textbf{I}$mprovement $\textbf{S}$earch framework with he$\textbf{t}$erogeneous gr$\textbf{a}$ph $\textbf{r}$epresentation—$\textit{MIStar}$. It employs a novel heterogeneous disjunctive graph that explicitly models the operation sequences on machines to accurately represent scheduling solutions. Moreover, a memory-enhanced heterogeneous graph neural network (MHGNN) is designed for feature extraction, leveraging historical trajectories to enhance the decision-making capability of the policy network. Finally, a parallel greedy search strategy is adopted to explore the solution space, enabling superior solutions with fewer iterations. Extensive experiments on synthetic data and public benchmarks demonstrate that $\textit{MIStar}$ significantly outperforms both traditional handcrafted improvement heuristics and state-of-the-art DRL-based constructive methods.

NeurIPS Conference 2025 Conference Paper

Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality

  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov's Accelerated Gradient ( NAG ) method achieves the optimal accelerated convergence. Extensive research has aimed to understand NAG from various perspectives, and a recent line of work approaches this from the viewpoint of online learning and online-to-batch conversion, emphasizing the role of optimistic online algorithms for acceleration. In this work, we contribute to this perspective by proposing novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis, thereby significantly simplifying the online algorithm design while preserving the optimal convergence rates. Specifically, we demonstrate the effectiveness of our conversions through the following results: (i) when combined with simple online gradient descent, our optimistic conversion achieves the optimal accelerated convergence; (ii) our conversion also applies to strongly convex objectives, and by leveraging both optimistic online-to-batch conversion and optimistic online algorithms, we achieve the optimal accelerated convergence rate for strongly convex and smooth objectives, for the first time through the lens of online-to-batch conversion; (iii) our optimistic conversion can achieve universality to smoothness~---~applicable to both smooth and non-smooth objectives without requiring knowledge of the smoothness coefficient~---~and remains efficient as non-universal methods by using only one gradient query in each iteration. Finally, we highlight the effectiveness of our optimistic online-to-batch conversions by a precise correspondence with NAG.

NeurIPS Conference 2025 Conference Paper

Parameter-free Algorithms for the Stochastically Extended Adversarial Model

  • Shuche Wang
  • Adarsh Barik
  • Peng Zhao
  • Vincent Tan

We develop the first parameter-free algorithms for the Stochastically Extended Adversarial (SEA) model, a framework that bridges adversarial and stochastic online convex optimization. Existing approaches for the SEA model require prior knowledge of problem-specific parameters, such as the diameter of the domain $D$ and the Lipschitz constant of the loss functions $G$, which limits their practical applicability. Addressing this, we develop parameter-free methods by leveraging the Optimistic Online Newton Step (OONS) algorithm to eliminate the need for these parameters. We first establish a comparator-adaptive algorithm for the scenario with unknown domain diameter but known Lipschitz constant, achieving an expected regret bound of $\tilde{O}\big(\Vert u\Vert_2^2 + \Vert u\Vert_2(\sqrt{\sigma^2_{1: T}} + \sqrt{\Sigma^2_{1: T}})\big)$, where $u$ is the comparator vector and $\sigma^2_{1: T}$ and $\Sigma^2_{1: T}$ represent the cumulative stochastic variance and cumulative adversarial variation, respectively. We then extend this to the more general setting where both $D$ and $G$ are unknown, attaining the comparator- and Lipschitz-adaptive algorithm. Notably, the regret bound exhibits the same dependence on $\sigma^2_{1: T}$ and $\Sigma^2_{1: T}$, demonstrating the efficacy of our proposed methods even when both parameters are unknown in the SEA model.

NeurIPS Conference 2025 Conference Paper

Provably Efficient Online RLHF with One-Pass Reward Modeling

  • Long-Fei Li
  • Yu-Yang Qian
  • Peng Zhao
  • Zhi-Hua Zhou

Reinforcement Learning from Human Feedback (RLHF) has shown remarkable success in aligning Large Language Models (LLMs) with human preferences. Traditional RLHF methods rely on a fixed dataset, which often suffers from limited coverage. To this end, online RLHF has emerged as a promising direction, enabling iterative data collection and refinement. Despite its potential, this paradigm faces a key bottleneck: the requirement to continuously integrate new data into the dataset and re-optimize the model from scratch at each iteration, resulting in computational and storage costs that grow linearly with the number of iterations. In this work, we address this challenge by proposing a one-pass reward modeling method that eliminates the need to store historical data and achieves constant-time updates per iteration. Specifically, we first formalize RLHF as a contextual preference bandit and develop a new algorithm based on online mirror descent with a tailored local norm, replacing the standard maximum likelihood estimation for reward modeling. We then apply it to various online RLHF settings, including passive data collection, active data collection, and deployment-time adaptation. We provide theoretical guarantees showing that our method enhances both statistical and computational efficiency. Finally, we design practical algorithms for LLMs and conduct experiments with the Llama-3-8B-Instruct and Qwen2. 5-7B-Instruct models on Ultrafeedback and Mixture2 datasets, validating the effectiveness of our approach.

NeurIPS Conference 2024 Conference Paper

A Simple and Optimal Approach for Universal Online Learning with Gradient Variations

  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

We investigate the problem of universal online learning with gradient-variation regret. Universal online learning aims to achieve regret guarantees without prior knowledge of the curvature of the online functions. Moreover, we study the problem-dependent gradient-variation regret as it plays a crucial role in bridging stochastic and adversarial optimization as well as game theory. In this work, we design a universal approach with the *optimal* gradient-variation regret simultaneously for strongly convex, exp-concave, and convex functions, thus addressing an open problem highlighted by [Yan et al. [2023]](https: //openreview. net/forum? id=AA1xrgAP5z). Our approach is *simple* since it is algorithmically efficient-to-implement with a two-layer online ensemble structure and only $1$ gradient query per round, and theoretically easy-to-analyze with a novel and alternative analysis to the gradient-variation regret. Concretely, previous works on gradient variations require controlling the algorithmic stability, which is challenging and leads to sub-optimal regret and less efficient algorithm design. Our analysis overcomes this issue by using a Bregman divergence negative term from linearization and a useful smoothness property.

JMLR Journal 2024 Journal Article

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

  • Peng Zhao
  • Yu-Jie Zhang
  • Lijun Zhang
  • Zhi-Hua Zhou

We investigate online convex optimization in non-stationary environments and choose dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $\mathcal{O}(\sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the guarantee for some easy problem instances, particularly when online functions are smooth. Specifically, we introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities: the variation in gradients of loss functions, the cumulative loss of the comparator sequence, and the minimum of these two terms. These quantities are at most $\mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile safeguard the same rate in the worst case. Notably, our proposed algorithms can achieve favorable dynamic regret with only one gradient per iteration, sharing the same gradient query complexity as the static regret minimization methods. To accomplish this, we introduce the collaborative online ensemble framework. The proposed framework employs a two-layer online ensemble to handle non-stationarity, and uses optimistic online learning and further introduces crucial correction terms to enable effective collaboration within the meta-base two layers, thereby attaining adaptivity. We believe the framework can be useful for broader problems. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

JBHI Journal 2024 Journal Article

Biomarkers-Aware Asymmetric Bibranch GAN With Adaptive Memory Batch Normalization for Prediction of Anti-VEGF Treatment Response in Neovascular Age-Related Macular Degeneration

  • Peng Zhao
  • Xian Song
  • Xiaoming Xi
  • Xiushan Nie
  • Xianjing Meng
  • Yi Qu
  • Yilong Yin

The emergence of anti-vascular endothelial growth factor (anti-VEGF) therapy has revolutionized neovascular age-related macular degeneration (nAMD). Post-therapeutic optical coherence tomography (OCT) imaging facilitates the prediction of therapeutic response to anti-VEGF therapy for nAMD. Although the generative adversarial network (GAN) is a popular generative model for post-therapeutic OCT image generation, it is realistically challenging to gather sufficient pre- and post-therapeutic OCT image pairs, resulting in overfitting. Moreover, the available GAN-based methods ignore local details, such as the biomarkers that are essential for nAMD treatment. To address these issues, a Biomarkers-aware Asymmetric Bibranch GAN (BAABGAN) is proposed to efficiently generate post-therapeutic OCT images. Specifically, one branch is developed to learn prior knowledge with a high degree of transferability from large-scale data, termed the source branch. Then, the source branch transfer knowledge to another branch, which is trained on small-scale paired data, termed the target branch. To boost the transferability, a novel Adaptive Memory Batch Normalization (AMBN) is introduced in the source branch, which learns more effective global knowledge that is impervious to noise via memory mechanism. Also, a novel Adaptive Biomarkers-aware Attention (ABA) module is proposed to encode biomarkers information into latent features of target branches to learn finer local details of biomarkers. The proposed method outperforms traditional GAN models and can produce high-quality post-treatment OCT pictures with limited data sets, as shown by the results of experiments.

AAAI Conference 2024 Conference Paper

Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation

  • Long-Fei Li
  • Peng Zhao
  • Zhi-Hua Zhou

We study reinforcement learning (RL) in episodic MDPs with adversarial full-information losses and the unknown transition. Instead of the classical static regret, we adopt dynamic regret as the performance measure which benchmarks the learner's performance with changing policies, making it more suitable for non-stationary environments. The primary challenge is to handle the uncertainties of unknown transition and unknown non-stationarity of environments simultaneously. We propose a general framework to decouple the two sources of uncertainties and show the dynamic regret bound naturally decomposes into two terms, one due to constructing confidence sets to handle the unknown transition and the other due to choosing sub-optimal policies under the unknown non-stationarity. To this end, we first employ the two-layer online ensemble structure to handle the adaptation error due to the unknown non-stationarity, which is model-agnostic. Subsequently, we instantiate the framework to three fundamental MDP models, including tabular MDPs, linear MDPs and linear mixture MDPs, and present corresponding approaches to control the exploration error due to the unknown transition. We provide dynamic regret guarantees respectively and show they are optimal in terms of the number of episodes K and the non-stationarity P̄ᴋ by establishing matching lower bounds. To the best of our knowledge, this is the first work that achieves the dynamic regret exhibiting optimal dependence on K and P̄ᴋ without prior knowledge about the non-stationarity for adversarial MDPs with unknown transition.

AIJ Journal 2024 Journal Article

Exploratory machine learning with unknown unknowns

  • Peng Zhao
  • Jia-Wei Shan
  • Yu-Jie Zhang
  • Zhi-Hua Zhou

In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. This paper studies a new problem setting in which there are unknown classes in the training data misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates training data by actively augmenting the feature space to discover potentially hidden classes. Our method consists of three ingredients including rejection model, feature exploration, and model cascade. We provide theoretical analysis to justify its superiority, and validate the effectiveness on both synthetic and real datasets.

AAAI Conference 2024 Conference Paper

Generative Model-Based Feature Knowledge Distillation for Action Recognition

  • Guiqin Wang
  • Peng Zhao
  • Yanjiang Shi
  • Cong Zhao
  • Shusen Yang

Knowledge distillation (KD), a technique widely employed in computer vision, has emerged as a de facto standard for improving the performance of small neural networks. However, prevailing KD-based approaches in video tasks primarily focus on designing loss functions and fusing cross-modal information. This overlooks the spatial-temporal feature semantics, resulting in limited advancements in model compression. Addressing this gap, our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model. In particular, the framework is organized into two steps: the initial phase is Feature Representation, wherein a generative model-based attention module is trained to represent feature semantics; Subsequently, the Generative-based Feature Distillation phase encompasses both Generative Distillation and Attention Distillation, with the objective of transferring attention-based feature semantics with the generative model. The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets, proving considerable enhancements in video action recognition task. Moreover, the effectiveness of our proposed framework is validated in the context of more intricate video action detection task. Our code is available at https://github.com/aaai-24/Generative-based-KD.

NeurIPS Conference 2024 Conference Paper

Gradient-Variation Online Learning under Generalized Smoothness

  • Yan-Feng Xie
  • Peng Zhao
  • Zhi-Hua Zhou

Gradient-variation online learning aims to achieve regret guarantees that scale with variations in the gradients of online functions, which is crucial for attaining fast convergence in games and robustness in stochastic optimization, hence receiving increased attention. Existing results often require the smoothness condition by imposing a fixed bound on gradient Lipschitzness, which may be unrealistic in practice. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. In this paper, we systematically study gradient-variation online learning under generalized smoothness. We extend the classic optimistic mirror descent algorithm to derive gradient-variation regret by analyzing stability over the optimization trajectory and exploiting smoothness locally. Then, we explore universal online learning, designing a single algorithm with the optimal gradient-variation regrets for convex and strongly convex functions simultaneously, without requiring prior knowledge of curvature. This algorithm adopts a two-layer structure with a meta-algorithm running over a group of base-learners. To ensure favorable guarantees, we design a new Lipschitz-adaptive meta-algorithm, capable of handling potentially unbounded gradients while ensuring a second-order bound to effectively ensemble the base-learners. Finally, we provide the applications for fast-rate convergence in games and stochastic extended adversarial optimization.

NeurIPS Conference 2024 Conference Paper

Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading

  • Xueyi Zhang
  • Chengwei Zhang
  • Mingrui Lao
  • Peng Zhao
  • Jun Tang
  • Yanming Guo
  • Siqi Cai
  • Xianghu Yue

Lip reading aims at transforming the videos of continuous lip movement into textual contents, and has achieved significant progress over the past decade. It serves as a critical yet practical assistance for speech-impaired individuals, with more practicability than speech recognition in noisy environments. With the increasing interpersonal communications in social media owing to globalization, the existing monolingual datasets for lip reading may not be sufficient to meet the exponential proliferation of bilingual and even multilingual users. However, to our best knowledge, research on code-switching is only explored in speech recognition, while the attempts in lip reading are seriously neglected. To bridge this gap, we have collected a bilingual code-switching lip reading benchmark composed of Chinese and English, dubbed CSLR. As the pioneering work, we recruited 62 speakers with proficient foundations in bothspoken Chinese and English to express sentences containing both involved languages. Through rigorous criteria in data selection, CSLR benchmark has accumulated 85, 560 video samples with a resolution of 1080x1920, totaling over 71. 3 hours of high-quality code-switching lip movement data. To systematically evaluate the technical challenges in CSLR, we implement commonly-used lip reading backbones, as well as competitive solutions in code-switching speech for benchmark testing. Experiments show CSLR to be a challenging and under-explored lip reading task. We hope our proposed benchmark will extend the applicability of code-switching lip reading, and further contribute to the communities of cross-lingual communication and collaboration. Our dataset and benchmark are accessible at https: //github. com/cslr-lipreading/CSLR.

IJCAI Conference 2024 Conference Paper

Multi-Attention Based Visual-Semantic Interaction for Few-Shot Learning

  • Peng Zhao
  • Yin Wang
  • Wei Wang
  • Jie Mu
  • Huiting Liu
  • Cong Wang
  • Xiaochun Cao

Few-Shot Learning (FSL) aims to train a model that can generalize to recognize new classes, with each new class having only very limited training samples. Since extracting discriminative features for new classes with few samples is challenging, existing FSL methods leverage visual and semantic prior knowledge to guide discriminative feature learning. However, for meta-learning purposes, the semantic knowledge of the query set is unavailable, so their features lack discriminability. To address this problem, we propose a novel Multi-Attention based Visual-Semantic Interaction (MAVSI) approach for FSL. Specifically, we utilize spatial and channel attention mechanisms to effectively select discriminative visual features for the support set based on its ground-truth semantics while using all the support set semantics for each query set sample. Then, a relation module with class prototypes of the support set is employed to supervise and select discriminative visual features for the query set. To further enhance the discriminability of the support set, we introduce a visual-semantic contrastive learning module to promote the similarity between visual features and their corresponding semantic features. Extensive experiments on four benchmark datasets demonstrate that our proposed MAVSI could outperform existing state-of-the-art FSL methods.

NeurIPS Conference 2024 Conference Paper

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

  • Long-Fei Li
  • Peng Zhao
  • Zhi-Hua Zhou

We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback, employing *dynamic regret* as the performance measure. We start with in-depth analyses of the strengths and limitations of the two most popular methods: occupancy-measure-based and policy-based methods. We observe that while the occupancy-measure-based method is effective in addressing non-stationary environments, it encounters difficulties with the unknown transition. In contrast, the policy-based method can deal with the unknown transition effectively but faces challenges in handling non-stationary environments. Building on this, we propose a novel algorithm that combines the benefits of both methods. Specifically, it employs (i) an *occupancy-measure-based global optimization* with a two-layer structure to handle non-stationary environments; and (ii) a *policy-based variance-aware value-targeted regression* to tackle the unknown transition. We bridge these two parts by a novel conversion. Our algorithm enjoys an $\widetilde{\mathcal{O}}(d \sqrt{H^3 K} + \sqrt{HK(H + \bar{P}_K)})$ dynamic regret, where $d$ is the feature mapping dimension, $H$ is the episode length, $K$ is the number of episodes, $\bar{P}_K$ is the non-stationarity measure. We show it is minimax optimal up to logarithmic factors by establishing a matching lower bound. To the best of our knowledge, this is the **first** work that achieves **near-optimal** dynamic regret for adversarial linear mixture MDPs with the unknown transition without prior knowledge of the non-stationarity measure.

JMLR Journal 2024 Journal Article

Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization

  • Sijia Chen
  • Yu-Jie Zhang
  • Wei-Wei Tu
  • Peng Zhao
  • Lijun Zhang

The stochastically extended adversarial (SEA) model, introduced by Sachs et al. (2022), serves as an interpolation between stochastic and adversarial online convex optimization. Under the smoothness condition on expected loss functions, it is shown that the expected static regret of optimistic follow-the-regularized-leader (FTRL) depends on the cumulative stochastic variance $\sigma_{1:T}^2$ and the cumulative adversarial variation $\Sigma_{1:T}^2$ for convex functions. Sachs et al. (2022) also provide a regret bound based on the maximal stochastic variance $\sigma_{\max}^2$ and the maximal adversarial variation $\Sigma_{\max}^2$ for strongly convex functions. Inspired by their work, we investigate the theoretical guarantees of optimistic online mirror descent (OMD) for the SEA model with smooth expected loss functions. For convex and smooth functions, we obtain the same $\mathcal{O}(\sqrt{\sigma_{1:T}^2}+\sqrt{\Sigma_{1:T}^2})$ regret bound, but with a relaxation of the convexity requirement from individual functions to expected functions. For strongly convex and smooth functions, we establish an $\mathcal{O}\left(\frac{1}{\lambda}\left(\sigma_{\max}^2+\Sigma_{\max}^2\right)\log \left(\left(\sigma_{1:T}^2 + \Sigma_{1:T}^2\right)/\left(\sigma_{\max}^2+\Sigma_{\max}^2\right)\right)\right)$ bound, better than their $\mathcal{O}((\sigma_{\max}^2$ $ + \Sigma_{\max}^2) \log T)$ result. For exp-concave and smooth functions, our approach yields a new $\mathcal{O}(d\log(\sigma_{1:T}^2+\Sigma_{1:T}^2))$ bound. Moreover, we introduce the first expected dynamic regret guarantee for the SEA model with convex and smooth expected functions, which is more favorable than static regret bounds in non-stationary environments. Furthermore, we expand our investigation to scenarios with non-smooth expected loss functions and propose novel algorithms built upon optimistic OMD with an implicit update, successfully attaining both static and dynamic regret guarantees. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

  • Long-Fei Li
  • Yu-Jie Zhang
  • Peng Zhao
  • Zhi-Hua Zhou

We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its significant benefits, incorporating the non-linear function raises substantial challenges in both *statistical* and *computational* efficiency. The best-known result of Hwang and Oh [2023] has achieved an $\widetilde{\mathcal{O}}(\kappa^{-1}dH^2\sqrt{K})$ regret upper bound, where $\kappa$ is a problem-dependent quantity, $d$ is the feature dimension, $H$ is the episode length, and $K$ is the number of episodes. However, we observe that $\kappa^{-1}$ exhibits polynomial dependence on the number of reachable states, which can be as large as the state space size in the worst case and thus undermines the motivation for function approximation. Additionally, their method requires storing all historical data and the time complexity scales linearly with the episode count, which is computationally expensive. In this work, we propose a statistically efficient algorithm that achieves a regret of $\widetilde{\mathcal{O}}(dH^2\sqrt{K} + \kappa^{-1}d^2H^2)$, eliminating the dependence on $\kappa^{-1}$ in the dominant term for the first time. We then address the computational challenges by introducing an enhanced algorithm that achieves the same regret guarantee but with only constant cost. Finally, we establish the first lower bound for this problem, justifying the optimality of our results in $d$ and $K$.

JMLR Journal 2024 Journal Article

Structured Optimal Variational Inference for Dynamic Latent Space Models

  • Peng Zhao
  • Anirban Bhattacharya
  • Debdeep Pati
  • Bani K. Mallick

We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions. To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework, where the time-dependent properties of the dynamic networks are exploited to facilitate computation and inference. Additionally, an easy-to-implement block coordinate ascent algorithm is developed with message-passing type updates in each block, whereas the complexity per iteration is linear with the number of nodes and time points. To certify the optimality, we demonstrate that the variational risk of the proposed variational inference approach attains the minimax optimal rate with only a logarithm factor under certain conditions. To this end, we first derive the minimax lower bound, which might be of independent interest. In addition, we show that the posterior under commonly adopted Gaussian random walk priors can achieve the minimax lower bound with only a logarithm factor. To the best of our knowledge, this is the first such a throughout theoretical analysis of Bayesian dynamic latent space models. Simulations and real data analysis demonstrate the efficacy of our methodology and the efficiency of our algorithm. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Universal Online Convex Optimization with $1$ Projection per Round

  • Wenhao Yang
  • Yibo Wang
  • Peng Zhao
  • Lijun Zhang

To address the uncertainty in function types, recent progress in online convex optimization (OCO) has spurred the development of universal algorithms that simultaneously attain minimax rates for multiple types of convex functions. However, for a $T$-round online problem, state-of-the-art methods typically conduct $O(\log T)$ projections onto the domain in each round, a process potentially time-consuming with complicated feasible sets. In this paper, inspired by the black-box reduction of Cutkosky and Orabona [2018], we employ a surrogate loss defined over simpler domains to develop universal OCO algorithms that only require $1$ projection. Embracing the framework of prediction with expert advice, we maintain a set of experts for each type of functions and aggregate their predictions via a meta-algorithm. The crux of our approach lies in a uniquely designed expert-loss for strongly convex functions, stemming from an innovative decomposition of the regret into the meta-regret and the expert-regret. Our analysis sheds new light on the surrogate loss, facilitating a rigorous examination of the discrepancy between the regret of the original loss and that of the surrogate loss, and carefully controlling meta-regret under the strong convexity condition. With only $1$ projection per round, we establish optimal regret bounds for general convex, exponentially concave, and strongly convex functions simultaneously. Furthermore, we enhance the expert-loss to exploit the smoothness property, and demonstrate that our algorithm can attain small-loss regret for multiple types of convex and smooth functions.

NeurIPS Conference 2023 Conference Paper

Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

  • Yu-Jie Zhang
  • Zhen-Yu Zhang
  • Peng Zhao
  • Masashi Sugiyama

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the covariate shift, where the input distributions of data change from the training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario --- continuous covariate shift --- in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importance-weighted learning, we theoretically show the method works effectively if the time-varying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online density ratio estimation method that can appropriately reuse historical information. Our method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.

NeurIPS Conference 2023 Conference Paper

Dynamic Regret of Adversarial Linear Mixture MDPs

  • Long-Fei Li
  • Peng Zhao
  • Zhi-Hua Zhou

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the \emph{dynamic regret} as the performance measure. Denote by $d$ the dimension of the feature mapping, $H$ the horizon, $K$ the number of episodes, $P_T$ the non-stationary measure, we propose a novel algorithm that enjoys an $\widetilde{\mathcal{O}}\big(\sqrt{d^2 H^3K} + \sqrt{H^4(K+P_T)(1+P_T)}\big)$ dynamic regret under the condition that $P_T$ is known, which improves previously best-known dynamic regret for adversarial linear mixture MDP and adversarial tabular MDPs. We also establish an $\Omega\big(\sqrt{d^2 H^3 K} + \sqrt{H K (H+P_T)}\big)$ lower bound, indicating our algorithm is \emph{optimal} in $K$ and $P_T$. Furthermore, when the non-stationary measure $P_T$ is unknown, we design an online ensemble algorithm with a meta-base structure, which is proved to achieve an $\widetilde{\mathcal{O}}\big(\sqrt{d^2 H^3K} + \sqrt{H^4(K+P_T)(1+P_T) + H^2 S_T^2}\big)$ dynamic regret and here $S_T$ is the expected switching number of the best base-learner. The result can be optimal under certain regimes.

IS Journal 2023 Journal Article

ECCVideo: A Scalable Edge Cloud Collaborative Video Analysis System

  • Qing Han
  • Xuebin Ren
  • Peng Zhao
  • Yimeng Wang
  • Luhui Wang
  • Cong Zhao
  • Xinyu Yang

Video analysis drives a wide range of applications in the fields of public safety, autonomous vehicles, etc. , with the great potential to impact society. Traditional cloud-based approaches are not applicable because of prohibitive bandwidth consumption and high response latency, while simply edge-based video analysis suffers from large computation delay, considering the restricted computing capacity of edge servers. Therefore, in this article, we focus on low-latency edge-cloud collaborative video analytic applications (ECCVApps) by making full use of resources at both the edge and cloud. Particularly, we present an edge-cloud collaborative video analysis system called ECCVideo, to support the unified management of heterogeneous servers and facilitate the development and deployment of large-scale ECCVApps. Under ECCVideo, we design the application architecture of ECCVApps, including presentation paradigm, transparent communication services, and full lifecycle management. To validate the proposed system, a real-time object detection application is deployed on the ECCVideo prototype.

JMLR Journal 2023 Journal Article

Non-stationary Online Learning with Memory and Non-stochastic Control

  • Peng Zhao
  • Yu-Hu Yan
  • Yu-Xiang Wang
  • Zhi-Hua Zhou

We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions and thus captures temporal effects of learning problems. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments, which competes algorithms' decisions with a sequence of changing comparators. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length. The key technical challenge is how to control the switching cost, the cumulative movements of player's decisions, which is neatly addressed by a novel switching-cost-aware online ensemble approach equipped with a new meta-base decomposition of dynamic policy regret and a careful design of meta-learner and base-learner that explicitly regularizes the switching cost. The results are further applied to tackle non-stationarity in online non-stochastic control (Agarwal et al., 2019), i.e., controlling a linear dynamical system with adversarial disturbance and convex cost functions. We derive a novel gradient-based controller with dynamic policy regret guarantees, which is the first controller provably competitive to a sequence of changing policies for online non-stochastic control. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

JMLR Journal 2023 Journal Article

Online Non-stochastic Control with Partial Feedback

  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

Online control with non-stochastic disturbances and adversarially chosen convex cost functions, referred to as online non-stochastic control, has recently attracted increasing attention. We study online non-stochastic control with partial feedback, where learners can only access partially observed states and partially informed (bandit) costs. The problem setting arises naturally in real-world decision-making applications and strictly generalizes exceptional cases studied disparately by previous works. We propose the first online algorithm for this problem, with an $\tilde{O}(T^{3/4})$ regret competing with the best policy in hindsight, where $T$ denotes the time horizon and the $\tilde{O}(\cdot)$-notation omits the poly-logarithmic factors in $T$. To further enhance the algorithms' robustness to changing environments, we then design a novel method with a two-layer structure to optimize the dynamic regret, a more challenging measure that competes with time-varying policies. Our method is based on the online ensemble framework by treating the controller above as the base learner. On top of that, we design two different meta-combiners to simultaneously handle the unknown variation of environments and the memory issue arising from the online control. We prove that the two resulting algorithms enjoy $\tilde{O}(T^{3/4}(1+P_T)^{1/2})$ and $\tilde{O}(T^{3/4}(1+P_T)^{1/4}+T^{5/6})$ dynamic regret respectively, where $P_T$ measures the environmental non-stationarity. Our results are further extended to unknown transition matrices. Finally, empirical studies in both synthetic linear and simulated nonlinear tasks validate our method's effectiveness, thus supporting the theoretical findings. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

Stochastic Approximation Approaches to Group Distributionally Robust Optimization

  • Lijun Zhang
  • Peng Zhao
  • Zhen-Hua Zhuang
  • Tianbao Yang
  • Zhi-Hua Zhou

This paper investigates group distributionally robust optimization (GDRO), with the purpose to learn a model that performs well over $m$ different distributions. First, we formulate GDRO as a stochastic convex-concave saddle-point problem, and demonstrate that stochastic mirror descent (SMD), using $m$ samples in each iteration, achieves an $O(m (\log m)/\epsilon^2)$ sample complexity for finding an $\epsilon$-optimal solution, which matches the $\Omega(m/\epsilon^2)$ lower bound up to a logarithmic factor. Then, we make use of techniques from online learning to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample complexity. Specifically, we cast GDRO as a two-players game where one player simply performs SMD and the other executes an online algorithm for non-oblivious multi-armed bandits. Next, we consider a more practical scenario where the number of samples that can be drawn from each distribution is different, and propose a novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates. Denote by $n_i$ the sample budget for the $i$-th distribution, and assume $n_1 \geq n_2 \geq \cdots \geq n_m$. In the first approach, we incorporate non-uniform sampling into SMD such that the sample budget is satisfied in expectation, and prove that the excess risk of the $i$-th distribution decreases at an $O(\sqrt{n_1 \log m}/n_i)$ rate. In the second approach, we use mini-batches to meet the budget exactly and also reduce the variance in stochastic gradients, and then leverage stochastic mirror-prox algorithm, which can exploit small variances, to optimize a carefully designed weighted GDRO problem. Under appropriate conditions, it attains an $O((\log m)/\sqrt{n_i})$ convergence rate, which almost matches the optimal $O(\sqrt{1/n_i})$ rate of only learning from the $i$-th distribution with $n_i$ samples.

NeurIPS Conference 2023 Conference Paper

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach

  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

In this paper, we propose an online convex optimization approach with two different levels of adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures of the online functions, while at a lower level, it can exploit the unknown niceness of the environments and attain problem-dependent guarantees. Specifically, we obtain $\mathcal{O}(\log V_T)$, $\mathcal{O}(d \log V_T)$ and $\hat{\mathcal{O}}(\sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and the $\hat{\mathcal{O}}(\cdot)$-notation omits $\log V_T$ factors. Our result not only safeguards the worst-case guarantees but also directly implies the small-loss bounds in analysis. Moreover, when applied to adversarial/stochastic convex optimization and game theory problems, our result enhances the existing universal guarantees. Our approach is based on a multi-layer online ensemble framework incorporating novel ingredients, including a carefully designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Notably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with carefully designed surrogate losses.

NeurIPS Conference 2022 Conference Paper

Adapting to Online Label Shift with Provable Guarantees

  • Yong Bai
  • Yu-Jie Zhang
  • Peng Zhao
  • Masashi Sugiyama
  • Zhi-Hua Zhou

The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity. Building upon that, we propose novel online ensemble algorithms to deal with the non-stationarity of the environments. Our approach enjoys optimal \emph{dynamic regret}, indicating that the performance is competitive with a clairvoyant who knows the online environments in hindsight and then chooses the best decision for each round. The obtained dynamic regret bound scales with the intensity and pattern of label distribution shift, hence exhibiting the adaptivity in the OLaS problem. Extensive experiments are conducted to validate the effectiveness and support our theoretical findings.

EAAI Journal 2022 Journal Article

Combining feature importance and neighbor node interactions for cold start recommendation

  • Jinjin Zhang
  • Chenhui Ma
  • Chengliang Zhong
  • Peng Zhao
  • Xiaodong Mu

Cold start recommendation usually views preference embedding as a missing problem because there is not any historical interaction. Existing approaches on graph neural networks for cold start users/items build the attribute embedding of each node through simply concatenating multiple features equally, and then reconstruct node preference embedding from its attribute embedding through a mapping function which is learned from warm users/items. However, these approaches do not consider the different contributions of features for building the attribute embedding. In addition, they assume the neighbors of a target node are independent and ignore interactions between the neighbor nodes when building the mapping function between the attribute embedding and the preference embedding. These two limitations reduce the effectiveness of their performance. To overcome these limitations, we propose a novel framework called Feature Importance and Neighbor node Interactions graph neural network (FINI) that exploits feature weights and interactions between neighbor nodes. The core ideas of the proposed method are as follows. First, it designs a global–local contexts attention mechanism in the attribute encoding layer, which can dynamically learn the importance of the attributes of each node and improve the expression of the feature embeddings. Second, it proposes a mixed interaction mechanism to augment the weighted information of neighbor node interactions in the neighbor interaction layer, which can strengthen the expressive capability of the user/item embeddings and further improve the quality of the mapping function for cold start users/items. Additionally, we also combine the rating prediction loss and mimic loss as the total loss to train the network in the prediction layer for model training. To assess the performance of the FINI, both cold start users and cold start items recommendation are considered. The results demonstrate FINI outperforms the state-of-the-art approaches for cold start recommendation and gains significant improvements in terms of metric evaluations.

IJCAI Conference 2022 Conference Paper

Contrastive Multi-view Hyperbolic Hierarchical Clustering

  • Fangfei Lin
  • Bing Bai
  • Kun Bai
  • Yazhou Ren
  • Peng Zhao
  • Zenglin Xu

Hierarchical clustering recursively partitions data at an increasingly finer granularity. In real-world applications, multi-view data have become increasingly important. This raises a less investigated problem, i. e. , multi-view hierarchical clustering, to better understand the hierarchical structure of multi-view data. To this end, we propose a novel neural network-based model, namely Contrastive Multi-view Hyperbolic Hierarchical Clustering(CMHHC). It consists of three components, i. e. , multi-view alignment learning, aligned feature similarity learning, and continuous hyperbolic hierarchical clustering. First, we align sample-level representations across multiple views in a contrastive way to capture the view-invariance information. Next, we utilize both the manifold and Euclidean similarities to improve the metric property. Then, we embed the representations into a hyperbolic space and optimize the hyperbolic embeddings via a continuous relaxation of hierarchical clustering loss. Finally, a binary clustering tree is decoded from optimized hyperbolic embeddings. Experimental results on five real-world datasets demonstrate the effectiveness of the proposed method and its components.

NeurIPS Conference 2022 Conference Paper

Efficient Methods for Non-stationary Online Learning

  • Peng Zhao
  • Yan-Feng Xie
  • Lijun Zhang
  • Zhi-Hua Zhou

Non-stationary online learning has drawn much attention in recent years. In particular, \emph{dynamic regret} and \emph{adaptive regret} are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity--those methods typically maintain $O(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $O(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

JMLR Journal 2021 Journal Article

Bandit Convex Optimization in Non-stationary Environments

  • Peng Zhao
  • Guanghui Wang
  • Lijun Zhang
  • Zhi-Hua Zhou

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves $O(T^{3/4}(1+P_T)^{1/2})$ and $O(T^{1/2}(1+P_T)^{1/2})$ dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the $\Omega(T^{1/2}(1+P_T)^{1/2})$ lower bound established in this paper. Notably, our algorithm is adaptive to the non-stationary environments since it does not require prior knowledge of the path-length $P_T$ ahead of time, which is generally unknown. We further extend the algorithm to an anytime version that does not require to know the time horizon $T$ in advance. Moreover, we study the adaptive regret, another widely used performance measure for online learning in non-stationary environments, and design an algorithm that provably enjoys the adaptive regret guarantees for BCO problems. Finally, we present empirical studies to validate the effectiveness of the proposed approach. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

AAAI Conference 2021 Conference Paper

Exploratory Machine Learning with Unknown Unknowns

  • Peng Zhao
  • Yu-Jie Zhang
  • Zhi-Hua Zhou

In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. In real situations, when the learned models do not work well, learners generally attribute the model failure to the inadequate selection of learning algorithms or the lack of enough labeled training samples. In this paper, we point out that there is an important category of failure, which owes to the fact that there are unknown classes in the training data misperceived as other labels, and thus their existence is unknown from the given supervision. Such problems of unknown unknown classes can hardly be addressed by common re-selection of algorithms or accumulation of training samples. For this purpose, we propose the exploratory machine learning, where in this paradigm once learner encounters unsatisfactory learning performance, she can examine the possibility and, if unknown unknowns really exist, deploy the optimal strategy of feature space augmentation to make unknown classes observable and be enabled for learning. Theoretical analysis and empirical study on both synthetic and real datasets validate the efficacy of our proposal.

EAAI Journal 2021 Journal Article

Improving current interest with item and review sequential patterns for sequential recommendation

  • Jinjin Zhang
  • Xiaodong Mu
  • Peng Zhao
  • Kai Kang
  • Chenhui Ma

Sequential recommendation (SR) aims to recommend items based on user information and behavior sequences. Almost all the existing works for SR construct short-term preference and long-term preference only based on the user–item interactions or the reviews rather than considering the two types of information simultaneously. In fact, interaction items and reviews commonly reflect the user’s semantic information, and play significant roles in modeling the user preference. In this paper, we propose a novel model named Parallel Item sequential pattern and Review Sequential Pattern (PIRSP) for the sequential recommendation. Specifically, first, PIRSP learns two levels of sequential patterns from item and review information, respectively: (1) item sequential pattern, which uses a gated recurrent unit with an item-attention mechanism to model history behavior sequences; (2) review sequential pattern, which takes a convolution neural network with a target-attention mechanism for modeling associated reviews of interaction items. Then, we introduce a fusion gating mechanism for selectively combining the two sequential patterns to learn the short-term preference. Second, we employ a convolution neural network with aspect information to learn the long-term preference. Finally, we utilize a linear fusion on the long-term preference and short-term preference for modeling user preference and making final recommendation. The experimental results indicate that our model outperforms other state-of-the-art methods on the Amazon dataset. Our analysis of PIRSP’s recommendation process shows the positive effect of the two types of information and fusion gating mechanism on the performance of sequential recommendation.

AAAI Conference 2021 Conference Paper

Large Motion Video Super-Resolution with Dual Subnet and Multi-Stage Communicated Upsampling

  • Hongying Liu
  • Peng Zhao
  • Zhubo Ruan
  • Fanhua Shang
  • Yuanyuan Liu

Video super-resolution (VSR) aims at restoring a video in low-resolution (LR) and improving it to higher-resolution (HR). Due to the characteristics of video tasks, it is very important that motion information among frames should be well concerned, summarized and utilized for guidance in a VSR algorithm. Especially, when a video contains large motion, conventional methods easily bring incoherent results or artifacts. In this paper, we propose a novel deep neural network with Dual Subnet and Multi-stage Communicated Upsampling (DSMC) for super-resolution of videos with large motion. We design a new module named U-shaped residual dense network with 3D convolution (U3D-RDN) for fine implicit motion estimation and motion compensation (MEMC) as well as coarse spatial feature extraction. And we present a new Multi-Stage Communicated Upsampling (MSCU) module to make full use of the intermediate results of upsampling for guiding the VSR. Moreover, a novel dual subnet is devised to aid the training of our DSMC, whose dual loss helps to reduce the solution space as well as enhance the generalization ability. Our experimental results confirm that our method achieves superior performance on videos with large motion compared to state-of-the-art methods.

AAAI Conference 2021 Conference Paper

Storage Fit Learning with Feature Evolvable Streams

  • Bo-Jian Hou
  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

Feature evolvable learning has been widely studied in recent years where old features will vanish and new features will emerge when learning with streams. Conventional methods usually assume that a label will be revealed after prediction at each time step. However, in practice, this assumption may not hold whereas no label will be given at most time steps. A good solution is to leverage the technique of manifold regularization to utilize the previous similar data to assist the refinement of the online model. Nevertheless, this approach needs to store all previous data which is impossible in learning with streams that arrive sequentially in large volume. Thus we need a buffer to store part of them. Considering that different devices may have different storage budgets, the learning approaches should be flexible subject to the storage budget limit. In this paper, we propose a new setting: Storage-Fit Feature-Evolvable streaming Learning (SF2 EL) which incorporates the issue of rarely-provided labels into feature evolution. Our framework is able to fit its behavior for different storage budgets when learning with feature evolvable streams with unlabeled data. Besides, both theoretical and empirical results validate that our approach can preserve the merit of the original feature evolvable learning i. e. , can always track the best baseline and thus perform well at any time step.

AAAI Conference 2021 Conference Paper

Towards Enabling Learnware to Handle Unseen Jobs

  • Yu-Jie Zhang
  • Yu-Hu Yan
  • Peng Zhao
  • Zhi-Hua Zhou

The learnware paradigm attempts to change the current style of machine learning deployment, i. e. , user builds her own machine learning application almost from scratch, to a style where the previous efforts of other users can be reused, given a publicly available pool of machine learning models constructed by previous users for various tasks. Each learnware is a high-quality pre-trained model associated with its specification. Although there are many models in the learnware market, only a few, even none, may be potentially helpful for the current job. Therefore, how to identify and deploy useful models becomes one of the main concerns, which particularly matters when the user’s job involves certain unseen parts not covered by the current learnware market. It becomes more challenging because, due to the privacy consideration, the raw data used for training models in the learnware market are inaccessible. In this paper, we develop a novel scheme that works can effectively reuse the learnwares even when the user’s job involves unseen parts. Despite the raw training data are inaccessible, our approach can provably identify samples from the unseen parts while assigning the rest to proper models in the market for predicting under a certain condition. Empirical studies also validate the efficacy of our approach.

NeurIPS Conference 2020 Conference Paper

An Unbiased Risk Estimator for Learning with Augmented Classes

  • Yu-Jie Zhang
  • Peng Zhao
  • Lanjihong Ma
  • Zhi-Hua Zhou

This paper studies the problem of learning with augmented classes (LAC), where augmented classes unobserved in the training data might emerge in the testing phase. Previous studies generally attempt to discover augmented classes by exploiting geometric properties, achieving inspiring empirical performance yet lacking theoretical understandings particularly on the generalization ability. In this paper we show that, by using unlabeled training data to approximate the potential distribution of augmented classes, an unbiased risk estimator of the testing distribution can be established for the LAC problem under mild assumptions, which paves a way to develop a sound approach with theoretical guarantees. Moreover, the proposed approach can adapt to complex changing environments where augmented classes may appear and the prior of known classes may change simultaneously. Extensive experiments confirm the effectiveness of our proposed approach.

IJCAI Conference 2020 Conference Paper

CDC: Classification Driven Compression for Bandwidth Efficient Edge-Cloud Collaborative Deep Learning

  • Yuanrui Dong
  • Peng Zhao
  • Hanqiao Yu
  • Cong Zhao
  • Shusen Yang

The emerging edge-cloud collaborative Deep Learning (DL) paradigm aims at improving the performance of practical DL implementations in terms of cloud bandwidth consumption, response latency, and data privacy preservation. Focusing on bandwidth efficient edge-cloud collaborative training of DNN-based classifiers, we present CDC, a Classification Driven Compression framework that reduces bandwidth consumption while preserving classification accuracy of edge-cloud collaborative DL. Specifically, to reduce bandwidth consumption, for resource-limited edge servers, we develop a lightweight autoencoder with a classification guidance for compression with classification driven feature preservation, which allows edges to only upload the latent code of raw data for accurate global training on the Cloud. Additionally, we design an adjustable quantization scheme adaptively pursuing the tradeoff between bandwidth consumption and classification accuracy under different network conditions, where only fine-tuning is required for rapid compression ratio adjustment. Results of extensive experiments demonstrate that, compared with DNN training with raw data, CDC consumes 14. 9× less bandwidth with an accuracy loss no more than 1. 06%, and compared with DNN training with data compressed by AE without guidance, CDC introduces at least 100% lower accuracy loss.

NeurIPS Conference 2020 Conference Paper

Dynamic Regret of Convex and Smooth Functions

  • Peng Zhao
  • Yu-Jie Zhang
  • Lijun Zhang
  • Zhi-Hua Zhou

We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $\mathcal{O}(\sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the dynamic regret by exploiting the smoothness condition. Specifically, we propose novel online algorithms that are capable of leveraging smoothness and replace the dependence on $T$ in the dynamic regret by problem-dependent quantities: the variation in gradients of loss functions, the cumulative loss of the comparator sequence, and the minimum of the previous two terms. These quantities are at most $\mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile guarantee the same rate in the worst case.

AAAI Conference 2020 Conference Paper

Optimal Margin Distribution Learning in Dynamic Environments

  • Teng Zhang
  • Peng Zhao
  • Hai Jin

Recently a promising research direction of statistical learning has been advocated, i. e. , the optimal margin distribution learning with the central idea that instead of the minimal margin, the margin distribution is more crucial to the generalization performance. Although the superiority of this new learning paradigm has been verified under batch learning settings, it remains open for online learning settings, in particular, the dynamic environments in which the underlying decision function varies over time. In this paper, we propose the dynamic optimal margin distribution machine and theoretically analyze its regret. Although the obtained bound has the same order with the best known one, our method can significantly relax the restrictive assumption that the function variation should be given ahead of time, resulting in better applicability in practical scenarios. We also derive an excess risk bound for the special case when the underlying decision function only evolves several discrete changes rather than varying continuously. Extensive experiments on both synthetic and real data sets demonstrate the superiority of our method.

AAAI Conference 2018 Conference Paper

Dual Set Multi-Label Learning

  • Chong Liu
  • Peng Zhao
  • Sheng-Jun Huang
  • Yuan Jiang
  • Zhi-Hua Zhou

In this paper, we propose a new learning framework named dual set multi-label learning, where there are two sets of labels, and an object has one and only one positive label in each set. Compared to general multi-label learning, the exclusive relationship among labels within the same set, and the pairwise inter-set label relationship are much more explicit and more likely to be fully exploited. To handle such kind of problems, a novel boosting style algorithm with model-reuse and distribution adjusting mechanisms is proposed to make the two label sets help each other. In addition, theoretical analyses are presented to show the superiority of learning from dual label sets to learning directly from all labels. To empirically evaluate the performance of our approach, we conduct experiments on two manually collected real-world datasets along with an adapted dataset. Experimental results validate the effectiveness of our approach for dual set multi-label learning.

AAAI Conference 2018 Conference Paper

Label Distribution Learning by Optimal Transport

  • Peng Zhao
  • Zhi-Hua Zhou

Label distribution learning (LDL) is a novel learning paradigm to deal with some real-world applications, especially when we care more about the relative importance of different labels in description of an instance. Although some approaches have been proposed to learn the label distribution, they could not explicitly learn and leverage the label correlation, which plays an importance role in LDL. In this paper, we propose an approach to learn the label distribution and exploit label correlations simultaneously based on the Optimal Transport (OT) theory. The problem is solved by alternatively learning the transportation (hypothesis) and ground metric (label correlations). Besides, we provide perhaps the first data-dependent risk bound analysis for label distribution learning by Sinkhorn distance, a commonly-used relaxation for OT distance. Experimental results on real-world datasets comparing with several state-of-the-art methods validate the effectiveness of our approach.

IJCAI Conference 2017 Conference Paper

Robust Softmax Regression for Multi-class Classification with Self-Paced Learning

  • Yazhou Ren
  • Peng Zhao
  • Yongpan Sheng
  • Dezhong Yao
  • Zenglin Xu

Softmax regression, a generalization of Logistic regression (LR) in the setting of multi-class classification, has been widely used in many machine learning applications. However, the performance of softmax regression is extremely sensitive to the presence of noisy data and outliers. To address this issue, we propose a model of robust softmax regression (RoSR) originated from the self-paced learning (SPL) paradigm for multi-class classification. Concretely, RoSR equipped with the soft weighting scheme is able to evaluate the importance of each data instance. Then, data instances participate in the classification problem according to their weights. In this way, the influence of noisy data and outliers (which are typically with small weights) can be significantly reduced. However, standard SPL may suffer from the imbalanced class influence problem, where some classes may have little influence in the training process if their instances are not sensitive to the loss. To alleviate this problem, we design two novel soft weighting schemes that assign weights and select instances locally for each class. Experimental results demonstrate the effectiveness of the proposed methods.

JMLR Journal 2007 Journal Article

Stagewise Lasso

  • Peng Zhao
  • Bin Yu

Many statistical machine learning algorithms minimize either an empirical loss function as in AdaBoost, or a penalized empirical loss as in Lasso or SVM. A single regularization tuning parameter controls the trade-off between fidelity to the data and generalizability, or equivalently between bias and variance. When this tuning parameter changes, a regularization "path" of solutions to the minimization problem is generated, and the whole path is needed to select a tuning parameter to optimize the prediction or interpretation performance. Algorithms such as homotopy-Lasso or LARS-Lasso and Forward Stagewise Fitting (FSF) (aka e-Boosting) are of great interest because of their resulted sparse models for interpretation in addition to prediction. In this paper, we propose the BLasso algorithm that ties the FSF (e-Boosting) algorithm with the Lasso method that minimizes the L 1 penalized L 2 loss. BLasso is derived as a coordinate descent method with a fixed stepsize applied to the general Lasso loss function ( L 1 penalized convex loss). It consists of both a forward step and a backward step. The forward step is similar to e-Boosting or FSF, but the backward step is new and revises the FSF (or e-Boosting) path to approximate the Lasso path. In the cases of a finite number of base learners and a bounded Hessian of the loss function, the BLasso path is shown to converge to the Lasso path when the stepsize goes to zero. For cases with a larger number of base learners than the sample size and when the true model is sparse, our simulations indicate that the BLasso model estimates are sparser than those from FSF with comparable or slightly better prediction performance, and that the the discrete stepsize of BLasso and FSF has an additional regularization effect in terms of prediction and sparsity. Moreover, we introduce the Generalized BLasso algorithm to minimize a general convex loss penalized by a general convex function. Since the (Generalized) BLasso relies only on differences not derivatives, we conclude that it provides a class of simple and easy-to-implement algorithms for tracing the regularization or solution paths of penalized minimization problems. [abs] [ pdf ][ bib ] &copy JMLR 2007. ( edit, beta )

JMLR Journal 2006 Journal Article

On Model Selection Consistency of Lasso

  • Peng Zhao
  • Bin Yu

Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection. Therefore it is important to study Lasso for model selection purposes. In this paper, we prove that a single condition, which we call the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Based on these results, sufficient conditions that are verifiable in practice are given to relate to previous works and help applications of Lasso for feature selection and sparse representation. This Irrepresentable Condition, which depends mainly on the covariance of the predictor variables, states that Lasso selects the true model consistently if and (almost) only if the predictors that are not in the true model are "irrepresentable" (in a sense to be clarified) by predictors that are in the true model. Furthermore, simulations are carried out to provide insights and understanding of this result. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )