Arrow Research search

Author name cluster

Wenjun Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers
2 author rows

Possible papers

24

AAAI Conference 2025 Conference Paper

Marginal Benefit Driven RL Teacher for Unsupervised Environment Design

  • Dexun Li
  • Wenjun Li
  • Pradeep Varakantham

Training generally capable agents in complex environments is a challenging task that involves identifying "right" environments at the training stage. Recent research has highlighted the potential of the Unsupervised Environment Design framework, which generates environment instances/levels adaptively at the frontier of the agent’s capabilities using regret measures. While regret approaches have shown great promise in generating feasible environments, they can produce difficult environments that are challenging for an RL agent to learn from. This is because regret represents the best-case (upper bound) learning potential and not the actual learning potential of an environment. To address this limitation, we propose an alternative mechanism that employs marginal benefit, focusing on the improvement (in terms of generalized performance) the agent policy gets for a given environment. The advantage of this new mechanism is that it is agent-focused (and not environment focused) and generates the "right" environments depending on the agent's policy. Additionally, to improve the generalizability of the agent, we introduce representative state diversity metric that aims to generate varied experiences for the agent. Finally, we provide detailed experimental results and ablation analysis to showcase the effectiveness of our new methods. We obtain SOTA results among RL based environment generation methods.

NeurIPS Conference 2024 Conference Paper

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

  • Jayden Teoh
  • Wenjun Li
  • Pradeep Varakantham

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty — a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations. To address this, this paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework. CENIE proposes a scalable, domain-agnostic, and curriculum-aware approach to quantifying environment novelty by leveraging the student's state-action space coverage from previous curriculum experiences. We then propose an implementation of CENIE that models this coverage and measures environment novelty using Gaussian Mixture Models. By integrating both regret and novelty as complementary objectives for curriculum design, CENIE facilitates effective exploration across the state-action space while progressively increasing curriculum complexity. Empirical evaluations demonstrate that augmenting existing regret-based UED algorithms with CENIE achieves state-of-the-art performance across multiple benchmarks, underscoring the effectiveness of novelty-driven autocurricula for robust generalization.

AAMAS Conference 2024 Conference Paper

Unifying Regret and State-Action Space Coverage for Effective Unsupervised Environment Design

  • Jayden Teoh Jing Teoh
  • Wenjun Li
  • Pradeep Varakantham

Unsupervised Environment Design (UED) employs interactive training between a teacher agent and a student agent to train generallycapable student agents. Existing UED methods primarily rely on regret to progressively introduce curriculum complexity for the student but often overlook the importance of environment novelty — a critical element for enhancing an agent’s exploration and generalization capabilities. There is a substantial lack of investigating the effects of environment novelty in UED. This paper addresses this gap by introducing the GMM-based Evaluation of Novelty In Environments (GENIE) framework. GENIE quantifies environment novelty within the UED paradigm by using Gaussian Mixture Models. To assess GENIE’s effectiveness in quantifying novelty and driving exploration, we integrate it with ACCEL, the state-ofthe-art UED algorithm. Empirical results demonstrate the superior zero-shot performance of this extended approach over existing UED algorithms, including its predecessor. By providing a means to quantify environment novelty, GENIE lays the groundwork for future UED algorithms to unify novelty-driven exploration and regret-driven exploitation in curriculum generation.

AAAI Conference 2024 Conference Paper

Unsupervised Training Sequence Design: Efficient and Generalizable Agent Training

  • Wenjun Li
  • Pradeep Varakantham

To train generalizable Reinforcement Learning (RL) agents, researchers recently proposed the Unsupervised Environment Design (UED) framework, in which a teacher agent creates a very large number of training environments and a student agent trains on the experiences in these environments to be robust against unseen testing scenarios. For example, to train a student to master the “stepping over stumps” task, the teacher will create numerous training environments with varying stump heights and shapes. In this paper, we argue that UED neglects training efficiency and its need for very large number of environments (henceforth referred to as infinite horizon training) makes it less suitable to training robots and non-expert humans. In real-world applications where either creating new training scenarios is expensive or training efficiency is of critical importance, we want to maximize both the learning efficiency and learning outcome of the student. To achieve efficient finite horizon training, we propose a novel Markov Decision Process (MDP) formulation for the teacher agent, referred to as Unsupervised Training Sequence Design (UTSD). Specifically, we encode salient information from the student policy (e.g., behaviors and learning progress) into the teacher's state space, enabling the teacher to closely track the student's learning progress and consequently discover the optimal training sequences with finite lengths. Additionally, we explore the teacher's efficient adaptation to unseen students at test time by employing the context-based meta-learning approach, which leverages the teacher's past experiences with various students. Finally, we empirically demonstrate our teacher's capability to design efficient and effective training sequences for students with varying capabilities.

IJCAI Conference 2023 Conference Paper

Generalization through Diversity: Improving Unsupervised Environment Design

  • Wenjun Li
  • Pradeep Varakantham
  • Dexun Li

Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e. g. , moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board). Due to this dependence, small changes in the environment (e. g. , positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.

IJCAI Conference 2019 Conference Paper

Resolution and Domination: An Improved Exact MaxSAT Algorithm

  • Chao Xu
  • Wenjun Li
  • Yongjie Yang
  • Jianer Chen
  • Jianxin Wang

We study the Maximum Satisfiability problem (MaxSAT). Particularly, we derive a branching algorithm of running time O*(1. 2989^m) for the MaxSAT problem, where m denotes the number of clauses in the given CNF formula. Our algorithm considerably improves the previous best result O*(1. 3248^m) by Chen and Kanj [2004] published 15 years ago. For our purpose, we derive improved branching strategies for variables of degrees 3, 4, and 5. The worst case of our branching algorithm is at variables of degree 4 which occur twice both positively and negatively in the given CNF formula. To serve the branching rules and shrink the size of the CNF formula, we also propose a variety of reduction rules which can be exhaustively applied in polynomial time and, moreover, some of them solve a bottleneck of the previous best algorithm.

ICRA Conference 2014 Conference Paper

Design optimization and comparison of magneto-rheological actuators

  • Wenjun Li
  • Peyman Yadmellat
  • Mehrdad R. Kermani

In this paper, an optimization method for designing MR clutches is studied. The proposed method optimizes the geometrical dimensions of an MR clutch, hence its mass, for given output torque and electrical input power. The main idea behind this optimization is that the input power and output torque are two parameters that are normally known to the designer prior to the design of an MR clutch and considering these parameters in the optimization as fixed values has a practical significance. Having presented the optimization method, we compare the characteristics of three different MR clutch configurations in order to demonstrate the effectiveness of the proposed method. A comparison between the drum, single-disk and multi-disk configurations of MR clutches is performed. Using the proposed method one can select a suitable configuration as well as the geometrical dimensions for an MR clutch that best suits the requirements of each individual design.

ICRA Conference 2014 Conference Paper

Linear torque actuation using FPGA-controlled Magneto-Rheological actuators

  • Wenjun Li
  • Peyman Yadmellat
  • Mehrdad R. Kermani

In recent years, Magneto-Rheological (MR) clutches have been increasingly used for realizing compliant actuation. One difficulty in using MR clutches is the existence of nonlinear hysteretic behaviors between the input current and output torque of an MR clutch. In this paper, a new closed-loop, Field-Programable-Gate-Array (FPGA) based control scheme to linearize an MR clutch's input-output relationship is presented. The feedback signal used in this control scheme is the magnetic field acquired from hall sensors within the MR clutch. The FPGA board uses this feedback signal to compensate for the nonlinear behavior of the MR clutch using an estimated model of the clutch magnetic field. The local use of an FPGA board will dramatically simplify the use of MR clutches for torque actuation. The effectiveness of the proposed technique is validated using an experimental platform that includes an MR clutch as part of a compliant actuation mechanism. The results clearly demonstrate that the use of the FPGA based closed-loop control scheme can effectively eliminate hysteretic behaviors of the MR clutch, allowing to have linear actuators with predictable behaviors.