Author name cluster

Peng Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

51 papers

1 author row

AAAI Conference 2026 Conference Paper

Efficient Few-Step Solution Generation via Discrete Flow Matching for Combinatorial Optimization

Yuanshu Li
Di Wang
Wei Du
Xuan Wu
Peng Zhao
Yubin Xiao
You Zhou

Combinatorial optimization problems (COPs) are fundamental to many real-world applications where efficiently producing high-quality solutions is critical. Recent advances in diffusion-based non-autoregressive models have reformulated solving COPs as a generative process, achieving promising results. However, almost all of these methods still suffer from accumulated errors and high inference costs due to the multi-step stochastic denoising process. To address these issues, we propose EFLOCO, an efficient discrete flow matching method for solving COPs, learning structured and deterministic solution trajectories. EFLOCO replaces noise-driven updates with smooth and guided transitions, thereby improves inference stability and quality. Furthermore, we introduce an adaptive time-step scheduler that makes more efforts in critical transition regions, yielding strong performance under few-step constraints. Experiments on standard Traveling Salesman Problems (TSPs) and Asymmetric TSPs (ATSPs) show that our method consistently outperforms both learning-based and heuristic baselines in terms of solution quality and inference speed.

PDF Details DOI

EAAI Journal 2025 Journal Article

A neural network guided dual-space search evolutionary algorithm for large scale multi-objective optimization

Jie Cao
Chengzhi Liu
Zuohan Chen
Jianlin Zhang
Peng Zhao

The curse of dimensionality caused by the increase of decision variables in large-scale multi-objective problems (LSMOPs) is still the current challenge. Although existing algorithms can simple large-scale multi-objective optimization problems. Nevertheless, a single search strategy might impact the solution of large-scale multi-objective optimization problems. To solve this problem, a dual-space search evolutionary algorithm for large-scale multi-objective optimization is proposed. Firstly, in the decision space, a neural network assisted operator with adaptive strategy is introduced. Specifically, when the number of non-dominated solutions is decreasing, the neural network is adopted to optimize the solutions with poor fitness for breaking away from local optimality. After that, the objective space of population is divided into several sub-regions by k-means clustering strategy. The solutions in these subregions are mapped onto the decision space through the inverse model, so that population can obtain as many non-dominated solutions as possible. Finally, the proposed algorithm is tested on a real-life problem which is Time-varying Ratio Error Estimation (TREE) and two benchmark suites which are large-scale multi-objective optimization problem (LSMOP) and unconstrained front (UF). The results show that the proposed algorithm exhibits competitive performance compared to other state-of-the-art algorithms on Inverted Generational Distance (IGD) Indicator and Hyper-volume (HV) Indicator.

Details DOI

IJCAI Conference 2025 Conference Paper

DGL: Dynamic Global-Local Information Aggregation for Scalable VRP Generalization with Self-Improvement Learning

Yubin Xiao
Yuesong Wu
Rui Cao
Di Wang
Zhiguang Cao
Xuan Wu
Peng Zhao
Yuanshu Li

The Vehicle Routing Problem (VRP) is a critical combinatorial optimization problem with wide-reaching real-world applications, particularly in logistics, transportation. While neural network-based VRP solvers have shown impressive results on test instances similar to training data, their performance often degrades when faced with varying scales and unseen distributions, limiting their practical applicability. To overcome these limitations, we introduce DGL (Dynamic Global-Local Information Aggregation), a novel model that combines global and local information to effectively solve VRPs. DGL dynamically adjusts local node selections within a localized range, capturing local invariance across problems of different scales and distributions, thereby enhancing generalization. At the same time, DGL integrates global context into the decision-making process, providing richer information for more informed decisions. Additionally, we propose a replacement-based self-improvement learning framework that leverages data augmentation and random replacement techniques, further enhancing DGL's robustness. Extensive experiments on synthetic datasets, benchmark datasets, and real-world country map instances demonstrate that DGL achieves state-of-the-art performance, particularly in generalizing to large-scale VRPs and real-world scenarios. These results showcase DGL's effectiveness in solving complex, realistic optimization challenges and highlight its potential for practical applications.

PDF Details DOI

JMLR Journal 2025 Journal Article

Efficient Methods for Non-stationary Online Learning

Peng Zhao
Yan-Feng Xie
Lijun Zhang
Zhi-Hua Zhou

Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of non-stationarity, in which multiple base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises concerns about computational complexity --- such methods typically maintain $O(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret that reduce the number of projections per round from $O(\log T)$ to $1$. The proposed algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial modifications for non-stationary online methods. Furthermore, we study an even stronger measure, namely "interval dynamic regret", and reduce the number of projections per round from $O(\log^2 T)$ to $1$ for minimizing it. Our reduction demonstrates broad generality and applies to two important applications: online stochastic control and online principal component analysis, resulting in methods that are both efficient and optimal. Finally, empirical studies verify our theoretical findings. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )