Author name cluster

Wenjun Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2025 Conference Paper

Marginal Benefit Driven RL Teacher for Unsupervised Environment Design

Dexun Li
Wenjun Li
Pradeep Varakantham

Training generally capable agents in complex environments is a challenging task that involves identifying "right" environments at the training stage. Recent research has highlighted the potential of the Unsupervised Environment Design framework, which generates environment instances/levels adaptively at the frontier of the agent’s capabilities using regret measures. While regret approaches have shown great promise in generating feasible environments, they can produce difficult environments that are challenging for an RL agent to learn from. This is because regret represents the best-case (upper bound) learning potential and not the actual learning potential of an environment. To address this limitation, we propose an alternative mechanism that employs marginal benefit, focusing on the improvement (in terms of generalized performance) the agent policy gets for a given environment. The advantage of this new mechanism is that it is agent-focused (and not environment focused) and generates the "right" environments depending on the agent's policy. Additionally, to improve the generalizability of the agent, we introduce representative state diversity metric that aims to generate varied experiences for the agent. Finally, we provide detailed experimental results and ablation analysis to showcase the effectiveness of our new methods. We obtain SOTA results among RL based environment generation methods.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Jayden Teoh
Wenjun Li
Pradeep Varakantham

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty — a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations. To address this, this paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework. CENIE proposes a scalable, domain-agnostic, and curriculum-aware approach to quantifying environment novelty by leveraging the student's state-action space coverage from previous curriculum experiences. We then propose an implementation of CENIE that models this coverage and measures environment novelty using Gaussian Mixture Models. By integrating both regret and novelty as complementary objectives for curriculum design, CENIE facilitates effective exploration across the state-action space while progressively increasing curriculum complexity. Empirical evaluations demonstrate that augmenting existing regret-based UED algorithms with CENIE achieves state-of-the-art performance across multiple benchmarks, underscoring the effectiveness of novelty-driven autocurricula for robust generalization.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design

Jayden Teoh Jing Teoh
Wenjun Li
Pradeep Varakantham

Unsupervised Environment Design (UED) employs interactive training between a teacher agent and a student agent to train generallycapable student agents. Existing UED methods primarily rely on regret to progressively introduce curriculum complexity for the student but often overlook the importance of environment novelty — a critical element for enhancing an agent’s exploration and generalization capabilities. There is a substantial lack of investigating the effects of environment novelty in UED. This paper addresses this gap by introducing the GMM-based Evaluation of Novelty In Environments (GENIE) framework. GENIE quantifies environment novelty within the UED paradigm by using Gaussian Mixture Models. To assess GENIE’s effectiveness in quantifying novelty and driving exploration, we integrate it with ACCEL, the state-ofthe-art UED algorithm. Empirical results demonstrate the superior zero-shot performance of this extended approach over existing UED algorithms, including its predecessor. By providing a means to quantify environment novelty, GENIE lays the groundwork for future UED algorithms to unify novelty-driven exploration and regret-driven exploitation in curriculum generation.

PDF

AAAI Conference 2024 Conference Paper

Unsupervised Training Sequence Design: Efficient and Generalizable Agent Training

Wenjun Li
Pradeep Varakantham

To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust against unseen testing scenarios. For example, to train a student to master the “stepping over stumps” task, the teacher will create numerous training environments with varying stump heights and shapes. In this paper, we argue that UED neglects training efficiency and its need for very large number of environments (henceforth referred to as infinite horizon training) makes it less suitable to training robots and non-expert humans. In real-world applications where either creating new training scenarios is expensive or training efficiency is of critical importance, we want to maximize both the learning efficiency and learning outcome of the student. To achieve efficient finite horizon training, we propose a novel Markov Decision Process (MDP) formulation for the teacher agent, referred to as Unsupervised Training Sequence Design (UTSD). Specifically, we encode salient information from the student policy (e.g., behaviors and learning progress) into the teacher's state space, enabling the teacher to closely track the student's learning progress and consequently discover the optimal training sequences with finite lengths. Additionally, we explore the teacher's efficient adaptation to unseen students at test time by employing the context-based meta-learning approach, which leverages the teacher's past experiences with various students. Finally, we empirically demonstrate our teacher's capability to design efficient and effective training sequences for students with varying capabilities.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Generalization through Diversity: Improving Unsupervised Environment Design

Wenjun Li
Pradeep Varakantham
Dexun Li

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e. g. , moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board). Due to this dependence, small changes in the environment (e. g. , positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.

PDF Details DOI

TCS Journal 2022 Journal Article

A 5k-vertex kernel for P2-packing

Wenjun Li
Junjie Ye
Yixin Cao