Author name cluster

Yueming Lyu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

ICML Conference 2025 Conference Paper

Diversifying Policy Behaviors with Extrinsic Behavioral Curiosity

Zhenglin Wan
Xingrui Yu
David Mark Bossens
Yueming Lyu
Qing Guo 0005
Flint Xiaofeng Fan
Yew-Soon Ong
Ivor W. Tsang

Imitation learning (IL) has shown promise in various applications (e. g. robot locomotion) but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for learning behavioral diverse policies. The source code of this work is provided at https: //github. com/vanzll/EBC.

Details

ICLR Conference 2025 Conference Paper

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

Kim Yong Tan
Yueming Lyu
Ivor W. Tsang
Yew-Soon Ong

Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavailable, and their objective functions are often not differentiable, such as image generation with human preferences, molecular generation for drug discovery, and material design. Thus, we need an **online** algorithm capable of collecting data during runtime and supporting a **black-box** objective function. Moreover, the **query efficiency** of the algorithm is also critical because the objective evaluation of the query is often expensive in real-world scenarios. In this work, we propose a novel and simple algorithm, **Fast Direct**, for query-efficient online black-box target generation. Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction, which is promising to perform query-efficient guided generation. Extensive experiments on twelve high-resolution ($\small {1024 \times 1024}$) image target generation tasks and six 3D-molecule target generation tasks show $\textbf{6}\times$ up to $\textbf{10}\times$ query efficiency improvement and $\textbf{11}\times$ up to $\textbf{44}\times$ query efficiency improvement, respectively.

Details

NeurIPS Conference 2025 Conference Paper

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection

Xin Gao
Jiyao Liu
Guanghao Li
Yueming LYU
Jianxiong Gao
Weichen Yu
Ningsheng Xu
Liang Wang

Recent advancements have explored text-to-image diffusion models for synthesizing out-of-distribution (OOD) samples, substantially enhancing the performance of OOD detection. However, existing approaches typically rely on perturbing text-conditioned embeddings, resulting in semantic instability and insufficient shift diversity, which limit generalization to realistic OOD. To address these challenges, we propose GOOD, a novel and flexible framework that directly guides diffusion sampling trajectories towards OOD regions using off-the-shelf in-distribution (ID) classifiers. GOOD incorporates dual-level guidance: (1) Image-level guidance based on the gradient of log partition to reduce input likelihood, drives samples toward low-density regions in pixel space. (2) Feature-level guidance, derived from k-NN distance in the classifier’s latent space, promotes sampling in feature-sparse regions. Hence, this dual-guidance design enables more controllable and diverse OOD sample generation. Additionally, we introduce a unified OOD score that adaptively combines image and feature discrepancies, enhancing detection robustness. We perform thorough quantitative and qualitative analyses to evaluate the effectiveness of GOOD, demonstrating that training with samples generated by GOOD can notably enhance OOD detection performance.

PDF Details

TMLR Journal 2025 Journal Article

Graph Potential Field Neural Network for Massive Agents Group-wise Path Planning

Yueming LYU
Xiaowei Zhou
Xingrui Yu
Ivor Tsang

Multi-agent path planning is important in both multi-agent path finding and multi-agent reinforcement learning areas. However, continual group-wise multi-agent path planning that requires the agents to perform as a team to pursue high team scores instead of individually is less studied. To address this problem, we propose a novel graph potential field-based neural network (GPFNN), which models a valid potential field map for path planning. Our GPFNN unfolds the T-step iterative optimization of the potential field maps as a T-layer feedforward neural network. Thus, a deeper GPFNN leads to more precise potential field maps without the over-smoothing issue. A potential field map inherently provides a monotonic potential flow from any source node to the target nodes to construct the optimal path (w.r.t. the potential decay), equipping our GPFNN with an elegant planning ability. Moreover, we incorporate dynamically updated boundary conditions into our GPFNN to address group-wise multi-agent path planning that supports both static targets and dynamic moving targets. Empirically, experiments on three different-sized mazes (up to $1025 \times 1025$ sized mazes) with up to 1,000 agents demonstrate the planning ability of our GPFNN to handle both static and dynamic moving targets. Experiments on extensive graph node classification tasks on six graph datasets (up to millions of nodes) demonstrate the learning ability of our GPFNN.

PDF Details

ICLR Conference 2025 Conference Paper

Image-level Memorization Detection via Inversion-based Inference Perturbation

Yue Jiang
Haokun Lin
Yang Bai
Bo Peng 0002
Zhili Liu
Yueming Lyu
Yong Yang
Xing Zheng

Recent studies have discovered that widely used text-to-image diffusion models can replicate training samples during image generation, a phenomenon known as memorization. Existing detection methods primarily focus on identifying memorized prompts. However, in real-world scenarios, image owners may need to verify whether their proprietary or personal images have been memorized by the model, even in the absence of paired prompts or related metadata. We refer to this challenge as image-level memorization detection, where current methods relying on original prompts fall short. In this work, we uncover two characteristics of memorized images after perturbing the inference procedure: lower similarity of the original images and larger magnitudes of TCNP. Building on these insights, we propose Inversion-based Inference Perturbation (IIP), a new framework for image-level memorization detection. Our approach uses unconditional DDIM inversion to derive latent codes that contain core semantic information of original images and optimizes random prompt embeddings to introduce effective perturbation. Memorized images exhibit distinct characteristics within the proposed pipeline, providing a robust basis for detection. To support this task, we construct a comprehensive setup for the image-level memorization detection, carefully curating datasets to simulate realistic memorization scenarios. Using this setup, we evaluate our IIP framework across three different memorization settings, demonstrating its state-of-the-art performance in identifying memorized images in various settings, even in the presence of data augmentation attacks.

Details

AAMAS Conference 2025 Conference Paper

Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration

Xingrui Yu
Zhenglin Wan
David Mark Bossens
Yueming LYU
Qing Guo
Ivor W. Tsang

Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning, which bridges the quality diversity optimization and imitation learning methods, are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavioroverfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.

PDF

NeurIPS Conference 2025 Conference Paper

InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

Haotian Chi
Zeyu Feng
Yueming LYU
Chengqi Zheng
Linbo Luo
Yew Soon Ong
Ivor Tsang
Hechang Chen

Long-horizon planning in robotic manipulation tasks requires translating underspecified, symbolic goals into executable control programs satisfying spatial, temporal, and physical constraints. However, language model-based planners often struggle with long-horizon task decomposition, robust constraint satisfaction, and adaptive failure recovery. We introduce InstructFlow, a multi-agent framework that establishes a symbolic, feedback-driven flow of information for code generation in robotic manipulation tasks. InstructFlow employs a InstructFlow Planner to construct and traverse a hierarchical instruction graph that decomposes goals into semantically meaningful subtasks, while a Code Generator generates executable code snippets conditioned on this graph. Crucially, when execution failures occur, a Constraint Generator analyzes feedback and induces symbolic constraints, which are propagated back into the instruction graph to guide targeted code refinement without regenerating from scratch. This dynamic, graph-guided flow enables structured, interpretable, and failure-resilient planning, significantly improving task success rates and robustness across diverse manipulation benchmarks, especially in constraint-sensitive and long-horizon scenarios.

PDF Details

ICLR Conference 2025 Conference Paper

Sharpness-Aware Black-Box Optimization

Feiyang Ye 0001
Yueming Lyu
Xuehao Wang
Masashi Sugiyama
Yu Zhang 0006
Ivor W. Tsang

Black-box optimization algorithms have been widely used in various machine learning problems, including reinforcement learning and prompt fine-tuning. However, directly optimizing the training loss value, as commonly done in existing black-box optimization methods, could lead to suboptimal model quality and generalization performance. To address those problems in black-box optimization, we propose a novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization. Specifically, the proposed SABO method first reparameterizes the objective function by its expectation over a Gaussian distribution. Then it iteratively updates the parameterized distribution by approximated stochastic gradients of the maximum objective value within a small neighborhood around the current solution in the Gaussian distribution space. Theoretically, we prove the convergence rate and generalization bound of the proposed SABO algorithm. Empirically, extensive experiments on the black-box prompt fine-tuning tasks demonstrate the effectiveness of the proposed SABO method in improving model generalization performance.

Details

ICLR Conference 2024 Conference Paper

Adaptive Stochastic Gradient Algorithm for Black-box Multi-Objective Learning

Feiyang Ye 0001
Yueming Lyu
Xuehao Wang
Yu Zhang 0006
Ivor W. Tsang

Multi-objective optimization (MOO) has become an influential framework for various machine learning problems, including reinforcement learning and multi-task learning. In this paper, we study the black-box multi-objective optimization problem, where we aim to optimize multiple potentially conflicting objectives with function queries only. To address this challenging problem and find a Pareto optimal solution or the Pareto stationary solution, we propose a novel adaptive stochastic gradient algorithm for black-box MOO, called ASMG. Specifically, we use the stochastic gradient approximation method to obtain the gradient for the distribution parameters of the Gaussian smoothed MOO with function queries only. Subsequently, an adaptive weight is employed to aggregate all stochastic gradients to optimize all objective functions effectively. Theoretically, we explicitly provide the connection between the original MOO problem and the corresponding Gaussian smoothed MOO problem and prove the convergence rate for the proposed ASMG algorithm in both convex and non-convex scenarios. Empirically, the proposed ASMG method achieves competitive performance on multiple numerical benchmark problems. Additionally, the state-of-the-art performance on the black-box multi-task learning problem demonstrates the effectiveness of the proposed ASMG method.

Details

ICML Conference 2024 Conference Paper

Diversified Batch Selection for Training Acceleration

Feng Hong 0004
Yueming Lyu
Jiangchao Yao
Ya Zhang 0002
Ivor W. Tsang
Yanfeng Wang 0001

The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance on additional reference models inherently limits their practical applications, when there are no such ideal models available. On the other hand, the vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner, which sacrifices the diversity and induces the redundancy. To tackle this dilemma, we propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples. Specifically, we define a novel selection objective that measures the group-wise orthogonalized representativeness to combat the redundancy issue of previous sample-wise criteria, and provide a principled selection-efficient realization. Extensive experiments across various tasks demonstrate the significant superiority of DivBS in the performance-speedup trade-off. The code is publicly available.

Details

ICLR Conference 2024 Conference Paper

On Harmonizing Implicit Subpopulations

Feng Hong 0004
Jiangchao Yao
Yueming Lyu
Zhihan Zhou 0002
Ivor W. Tsang
Ya Zhang 0002
Yanfeng Wang 0001

Machine learning algorithms learned from data with skewed distributions usually suffer from poor generalization, especially when minority classes matter as much as, or even more than majority ones. This is more challenging on class-balanced data that has some hidden imbalanced subpopulations, since prevalent techniques mainly conduct class-level calibration and cannot perform subpopulation-level adjustments without subpopulation annotations. Regarding implicit subpopulation imbalance, we reveal that the key to alleviating the detrimental effect lies in effective subpopulation discovery with proper rebalancing. We then propose a novel subpopulation-imbalanced learning method called Scatter and HarmonizE (SHE). Our method is built upon the guiding principle of optimal data partition, which involves assigning data to subpopulations in a manner that maximizes the predictive information from inputs to labels. With theoretical guarantees and empirical evidences, SHE succeeds in identifying the hidden subpopulations and encourages subpopulation-balanced predictions. Extensive experiments on various benchmark datasets show the effectiveness of SHE.

Details

ECAI Conference 2023 Conference Paper

Exploring Information Bottleneck for Weakly Supervised Semantic Segmentation

Jie Qin
Yueming Lyu
Xingang Wang

Image-level weakly supervised semantic segmentation (WSSS) has attracted much attention due to the easily acquired class labels. Most existing methods resort to utilizing Class Activation Maps (CAMs) obtained from the classification network to play as the initial pseudo labels. However, the classifiers only focus on the most discriminative regions of the target objects, which is referred to as the information bottleneck from the perspective of the information theory. To alleviate this information bottleneck limitation, we propose an Information Perturbation Module (IPM) to explicitly obtain the information difference maps, which provide the accurate direction and magnitude of the information compression in the classification network. After that, an information bottleneck breakthrough mechanism with three branches is proposed to overcome the information bottleneck in the classification network for segmentation. Additionally, a diversity regularization on the generated two information difference maps is proposed to improve the diversity of the output CAMs. Extensive experiments on PASCAL VOC2012 val and test sets demonstrate that the proposed method can effectively improve the weakly supervised semantic segmentation performance of the advanced approaches.

Details

NeurIPS Conference 2023 Conference Paper

Fast Rank-1 Lattice Targeted Sampling for Black-box Optimization

Yueming LYU

Black-box optimization has gained great attention for its success in recent applications. However, scaling up to high-dimensional problems with good query efficiency remains challenging. This paper proposes a novel Rank-1 Lattice Targeted Sampling (RLTS) technique to address this issue. Our RLTS benefits from random rank-1 lattice Quasi-Monte Carlo, which enables us to perform fast local exact Gaussian processes (GP) training and inference with $O(n \log n)$ complexity w. r. t. $n$ batch samples. Furthermore, we developed a fast coordinate searching method with $O(n \log n)$ time complexity for fast targeted sampling. The fast computation enables us to plug our RLTS into the sampling phase of stochastic optimization methods. This improves the query efficiency while scaling up to higher dimensional problems than Bayesian optimization. Moreover, to construct rank-1 lattices efficiently, we proposed a closed-form construction. Extensive experiments on challenging benchmark test functions and black-box prompt fine-tuning for large language models demonstrate the query efficiency of our RLTS technique.

PDF Details

ICLR Conference 2020 Conference Paper

Curriculum Loss: Robust Learning and Generalization against Label Corruption

Yueming Lyu
Ivor W. Tsang

Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship between empirical adversary (reweighted) risk (Hu et al. 2018). Although the 0-1 loss is robust to outliers, it is also difficult to optimize. To efficiently optimize the 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for stagewise training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on noisy MNIST, CIFAR10 and CIFAR100 dataset validate the robustness of the proposed loss.

Details

ICML Conference 2020 Conference Paper

Intrinsic Reward Driven Imitation Learning via Generative Model

Xingrui Yu
Yueming Lyu
Ivor W. Tsang

Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e. g. , Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the module’s dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration.

Details

NeurIPS Conference 2020 Conference Paper

Subgroup-based Rank-1 Lattice Quasi-Monte Carlo

Yueming LYU
Yuan Yuan
Ivor Tsang

Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice property for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming through an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate near-optimal rank-1 lattice compared with Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on the benchmark integration test problems and the kernel approximation problems.

PDF Details

ICML Conference 2017 Conference Paper

Spherical Structured Feature Maps for Kernel Approximation

Yueming Lyu

We propose Spherical Structured Feature (SSF) maps to approximate shift and rotation invariant kernels as well as $b^{th}$-order arc-cosine kernels (Cho \& Saul, 2009). We construct SSF maps based on the point set on $d-1$ dimensional sphere $\mathbb{S}^{d-1}$. We prove that the inner product of SSF maps are unbiased estimates for above kernels if asymptotically uniformly distributed point set on $\mathbb{S}^{d-1}$ is given. According to (Brauchart \& Grabner, 2015), optimizing the discrete Riesz s-energy can generate asymptotically uniformly distributed point set on $\mathbb{S}^{d-1}$. Thus, we propose an efficient coordinate decent method to find a local optimum of the discrete Riesz s-energy for SSF maps construction. Theoretically, SSF maps construction achieves linear space complexity and loglinear time complexity. Empirically, SSF maps achieve superior performance compared with other methods.

Details