Arrow Research search

Author name cluster

YaFei Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
3 author rows

Possible papers

7

EAAI Journal 2026 Journal Article

Parallel self-learning adaptive strategy based on model predictive control for trajectory tracking of autonomous vehicles

  • Feixiang Xu
  • Junkang Feng
  • Yafei Wang
  • Xiaoyi Wang
  • Chuanwang Shen
  • Chen Zhou

A significant obstacle to reliable autonomous driving is the vehicle’s ability to maintain accurate trajectory tracking on variable and critical road surfaces. To mitigate the nonlinear and uncertain dynamics of autonomous vehicles operating on low-adhesion and variable-adhesion roads, a novel parallel self-learning adaptive strategy based on model predictive control is proposed in this paper. Within the strategy, model predictive control and a parallel self-learning adaptive strategy are integrated to achieve trajectory tracking on different adhesion coefficient roads. Under stable adhesion levels, the model-free adaptive control method from the proposed parallel strategy is applied to capture the nonlinearity from historical data, compensating for system nonlinearities and ensuring accurate control. On the other hand, once a sudden variation of the road adhesion coefficient is detected, the control authority will be seamlessly transferred to the action-dependent heuristic dynamic programming (ADHDP), enabling rapid adaptation to environmental disturbances. In addition, the stability and convergence of ADHDP are further reinforced through an experience replay mechanism and an adaptive exploration noise scheduler utilizing random vector functional link neural networks. The closed-loop system signals are proven to be uniformly ultimately bounded by the Lyapunov method. Finally, extensive simulation experiments with existing works are conducted to evaluate the proposed scheme. The results demonstrate that the proposed framework significantly improves environmental adaptability and the trajectory tracking accuracy of the autonomous vehicle under extreme and varying road adhesion conditions.

IROS Conference 2025 Conference Paper

Along-Edge Autonomous Driving on Curvy Roads Based on Frenet Frame: A Stable Hierarchical Planning Framework

  • Hong-Yi Kang
  • Jun-Guo Lu
  • Kai-Xiong Li
  • Qing-Hao Zhang
  • YaFei Wang

Along-edge driving, where an autonomous vehicle follows road edges, is increasingly common in urban environments and particularly challenging on curvy roads due to rapidly changing curvature. This paper presents a hierarchical trajectory planning framework that integrates Cartesian and Frenet frames to optimize along-edge motion. Cartesian planners struggle with nonlinear constraints, while Frenet-based approaches simplify edge-relative motion but often neglect trajectory curvature and suffer from non-convexity in obstacle avoidance. To address these limitations, our method employs an optimization-based planner with curvature constraints for precise along-edge motion and a sampling-based planner for stable lane changes when encountering obstacles. This novel approach maintains an along-edge distance within a precision of 0. 1m, reducing error by 80% (from 0. 7m to 0. 1m). It also ensures smooth trajectory transitions and enhances stability in complex environments. Simulations and real-world experiments validate the framework’s efficiency, achieving an average planning time of 1. 22ms per frame while effectively balancing accuracy, feasibility, and real-time performance.

ICML Conference 2025 Conference Paper

Differentially Private Analysis for Binary Response Models: Optimality, Estimation, and Inference

  • Ce Zhang
  • Yixin Han
  • Yafei Wang
  • Xiaodong Yan
  • Linglong Kong
  • Ting Li
  • Bei Jiang

Randomized response (RR) mechanisms constitute a fundamental and effective technique for ensuring label differential privacy (LabelDP). However, existing RR methods primarily focus on the response labels while overlooking the influence of covariates and often do not fully address optimality. To address these challenges, this paper explores optimal LabelDP procedures using RR mechanisms, focusing on achieving optimal estimation and inference in binary response models. We first analyze the asymptotic behaviors of RR binary response models and then optimize the procedure by maximizing the trace of the Fisher Information Matrix within the $\varepsilon$- and $(\varepsilon, \delta)$-LabelDP constraints. Our theoretical results indicate that the proposed methods achieve optimal LabelDP guarantees while maintaining statistical accuracy in binary response models under mild conditions. Furthermore, we develop private confidence intervals with nominal coverage for statistical inference. Extensive simulation studies and real-world applications confirm that our methods outperform existing approaches in terms of precise estimation, privacy protection, and reliable inference.

NeurIPS Conference 2025 Conference Paper

Intrinsic Benefits of Categorical Distributional Loss: Uncertainty-aware Regularized Exploration in Reinforcement Learning

  • Ke Sun
  • Yingnan Zhao
  • Enze Shi
  • Yafei Wang
  • Xiaodong Yan
  • Bei Jiang
  • Linglong Kong

The remarkable empirical performance of distributional reinforcement learning~(RL) has garnered increasing attention to understanding its theoretical advantages over classical RL. By decomposing the categorical distributional loss commonly employed in distributional RL, we find that the potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization. This less-studied entropy regularization aims to capture additional knowledge of return distribution beyond only its expectation, contributing to an augmented reward signal in policy optimization. In contrast to the vanilla entropy regularization in MaxEnt RL, which explicitly encourages exploration by promoting diverse actions, the novel entropy regularization derived from categorical distributional loss implicitly updates policies to align the learned policy with (estimated) environmental uncertainty. Finally, extensive experiments verify the significance of this uncertainty-aware regularization from distributional RL on the empirical benefits over classical RL. Our study offers an innovative exploration perspective to explain the intrinsic benefits of distributional learning in RL.

ICML Conference 2024 Conference Paper

Sample Average Approximation for Conditional Stochastic Optimization with Dependent Data

  • Yafei Wang
  • Bo Pan
  • Mei Li
  • Jianya Lu
  • Lingchen Kong
  • Bei Jiang
  • Linglong Kong

Conditional Stochastic Optimization (CSO) is a powerful modelling paradigm for optimization under uncertainty. The existing literature on CSO is mainly based on the independence assumption of data, which shows that the solution of CSO is asymptotically consistent and enjoys a finite sample guarantee. The independence assumption, however, does not typically hold in many important applications with dependence patterns, such as time series analysis, operational control, and reinforcement learning. In this paper, we aim to fill this gap and consider a Sample Average Approximation (SAA) for CSO with dependent data. Leveraging covariance inequalities and independent block sampling technique, we provide theoretical guarantees of SAA for CSO with dependent data. In particular, we show that SAA for CSO retains asymptotic consistency and a finite sample guarantee under mild conditions. In addition, we establish the sample complexity $O(d / \varepsilon^4)$ of SAA for CSO, which is shown to be of the same order as independent cases. Through experiments on several applications, we verify the theoretical results and demonstrate that dependence does not degrade the performance of the SAA approach in real data applications.

AAAI Conference 2022 Conference Paper

Sample Average Approximation for Stochastic Optimization with Dependent Data: Performance Guarantees and Tractability

  • Yafei Wang
  • Bo Pan
  • Wei Tu
  • Peng Liu
  • Bei Jiang
  • Chao Gao
  • Wei Lu
  • Shangling Jui

Sample average approximation (SAA), a popular method for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in online learning with time series data or distributed computing with Markovian training samples. In this paper, we show that SAA remains tractable when the distribution of unknown parameters is only observable through dependent instances and still enjoys asymptotic consistency and finite sample guarantees. Specifically, we provide a rigorous probability error analysis to derive 1 - beta confidence bounds for the out-of-sample performance of SAA estimators and show that these estimators are asymptotically consistent. We then, using monotone operator theory, study the performance of a class of stochastic first-order algorithms trained on a dependent source of data. We show that approximation error for these algorithms is bounded and concentrates around zero, and establish deviation bounds for iterates when the underlying stochastic process is phi-mixing. The algorithms presented can be used to handle numerically inconvenient loss functions such as the sum of a smooth and non-smooth function or of non-smooth functions with constraints. To illustrate the usefulness of our results, we present several stochastic versions of popular algorithms such as stochastic proximal gradient descent (S-PGD), stochastic relaxed Peaceman– Rachford splitting algorithms (S-rPRS), and numerical experiment.

NeurIPS Conference 2021 Conference Paper

Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

  • Ke Sun
  • Yafei Wang
  • Yi Liu
  • Yingnan Zhao
  • Bo Pan
  • Shangling Jui
  • Bei Jiang
  • Linglong Kong

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.