Miao Xin Papers

TMLR Journal 2025 Journal Article

Uncertainty-aware Reward Design Process

Yang Yang
Xiaolu Zhou
Bosong Ding
Miao Xin

Designing effective reward functions is a cornerstone of reinforcement learning (RL), yet it remains a challenging process due to the inefficiencies and inconsistencies inherent in conventional reward engineering methodologies. Recent advances have explored leveraging large language models (LLMs) to automate the design of reward functions. However, LLMs’ insufficient numerical optimization capabilities often result in suboptimal reward hyperparameter tuning, while non-selective validation of candidate reward functions leads to substantial computational overhead. To address these challenges, we propose the Uncertainty-aware Reward Design Process (URDP), a novel framework that integrates large language models to streamline reward function design and evaluation. URDP quantifies candidate reward function uncertainty based on the self-consistency analysis, enabling simulation-free identification of ineffective reward components while discovering novel ones. Furthermore, we introduce uncertainty-aware Bayesian optimization (UABO), which incorporates uncertainty estimation to improve the hyperparameter configuration. Finally, we construct a bi-level optimization framework by decoupling the reward component optimization and the hyperparameter tuning. URDP promotes the collaboration between the reward logic reasoning of the LLMs and the numerical optimization strengths of the Bayesian optimization. We conduct a comprehensive evaluation of URDP across 35 diverse tasks spanning three benchmark environments: IsaacGym, Bidexterous Manipulation, and ManiSkill2. Our experimental results demonstrate that URDP not only generates higher-quality reward functions but also achieves significant improvements in the efficiency of automated reward design compared to existing approaches. We open-source all code at https://github.com/Yy12136/URDP.

PDF Details

AAAI Conference 2018 Conference Paper

Asynchronous Doubly Stochastic Sparse Kernel Learning

Bin Gu
Miao Xin
Zhouyuan Huo
Heng Huang

Kernel methods have achieved tremendous success in the past two decades. In the current big data era, data collection has grown tremendously. However, existing kernel methods are not scalable enough both at the training and predicting steps. To address this challenge, in this paper, we ﬁrst introduce a general sparse kernel learning formulation based on the random feature approximation, where the loss functions are possibly non-convex. Then we propose a new asynchronous parallel doubly stochastic algorithm for large scale sparse kernel learning (AsyDSSKL). To the best our knowledge, AsyDSSKL is the ﬁrst algorithm with the techniques of asynchronous parallel computation and doubly stochastic optimization. We also provide a comprehensive convergence guarantee to AsyDSSKL. Importantly, the experimental results on various large-scale real-world datasets show that, our AsyDSSKL method has the signiﬁcant superiority on the computational efﬁciency at the training and predicting steps over the existing kernel methods.

PDF Details

Possible papers

Uncertainty-aware Reward Design Process

Asynchronous Doubly Stochastic Sparse Kernel Learning