Arrow Research search

Author name cluster

Wei Xiao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning

  • Songlin Li
  • Wei Xiao
  • Hao Wu
  • Xiaodan Zhang
  • Daolong An
  • Shuai Lü

In offline-to-online (O2O) reinforcement learning, achieving efficient performance improvement while maintaining training stability remains a critical challenge for effective fine-tuning. Existing O2O methods usually focus on the balance between policy improvement and policy constraint during online fine-tuning. However, they often overlook sample differences, leading to suboptimal performance. To address this challenge, we identify that the effectiveness of policy learning exhibits significant variation across states. Therefore, we propose the notion of state proficiency to capture the degree of effective learning in a given state. We propose State Proficiency-Based Adaptive Fine-Tuning (SPA), a straightforward yet effective method that establishes proficiency-based sample priorities in policy optimization to facilitate effective fine-tuning. Specifically, SPA focuses on low proficiency samples during policy improvement to enhance sample efficiency, while emphasizing high proficiency samples during policy constraint to ensure stable training. Extensive empirical results demonstrate that SPA achieves significant improvements over existing methods, attaining state-of-the-art performance on the D4RL benchmark.

AAAI Conference 2025 Conference Paper

Boosting Causal Structure Learning: An Asymmetric Exponential Modulation Gaussian-Based Adaptive Sample Reweighting Framework

  • Wei Xiao
  • Hongbin Wang
  • Ming He
  • Nianbin Wang

Recent advances in differentiable score-based methods for Directed Acyclic Graph (DAG) structure learning have revolutionized the problem of combinatorial structure learning, transforming it into a continuous optimization task. Despite their remarkable success, these methods rely on a key assumption that all samples have the same level of difficulty and no data heterogeneity. When this assumption does not hold, causal discovery algorithms based on it inevitably return networks with many spurious edges. Despite existing research, the current method ignores the reality of outliers in the samples, introducing certain limitations that still result in erroneous edges. Inspired by the rapid decay of the Gaussian distribution as distance from the center increases, we propose an innovative adaptive sample reweighting framework based on asymmetric exponential modulation Gaussian, coined DAG-AEG. DAG-AEG boosts DAG structure learning by analyzing the distribution of sample losses and employing the proposed method for adaptive sample attention. Additionally, it can be adapted to heterogeneous data. We used various causal structure learning methods to test the performance of DAG-AEG on synthetic and real datasets. The experimental results demonstrate that the proposed framework significantly improves the performance across all methods, outperforming existing methods.

NeurIPS Conference 2025 Conference Paper

DyMoDreamer: World Modeling with Dynamic Modulation

  • Boxuan Zhang
  • Runqing Wang
  • Wei Xiao
  • Weipu Zhang
  • Jian Sun
  • Gao Huang
  • Jie Chen
  • Gang Wang

A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions. Model-based reinforcement learning (MBRL) mitigates this by building world models that simulate environmental dynamics and generate synthetic experience, improving sample efficiency. However, conventional world models process observations holistically, failing to decouple dynamic objects and temporal features from static backgrounds. This approach is computationally inefficient, especially for visual tasks where dynamic objects significantly influence rewards and decision-making performance. To address this, we introduce DyMoDreamer, a novel MBRL algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information. DyMoDreamer employs differential observations derived from a novel inter-frame differencing mask, explicitly encoding object-level motion cues and temporal dynamics. Dynamic modulation is modeled as stochastic categorical distributions and integrated into a recurrent state-space model (RSSM), enhancing the model's focus on reward-relevant dynamics. Experiments demonstrate that DyMoDreamer sets a new state-of-the-art on the Atari $100$k benchmark with a $156. 6$\% mean human-normalized score, establishes a new record of $832$ on the DeepMind Visual Control Suite, and gains a $9. 5$\% performance improvement after $1$M steps on the Crafter benchmark.

IROS Conference 2025 Conference Paper

Integrating Trajectory Optimization and Reinforcement Learning for Quadrupedal Jumping with Terrain-Adaptive Landing

  • Renjie Wang
  • Shangke Lyu
  • Xin Lang
  • Wei Xiao
  • Donglin Wang

Jumping constitutes an essential component of quadruped robots’ locomotion capabilities, which includes dynamic take-off and adaptive landing. Existing quadrupedal jumping studies mainly focused on the stance and flight phase by assuming a flat landing ground, which is impractical in many real world cases. This work proposes a safe landing framework that achieves adaptive landing on rough terrains by combining Trajectory Optimization (TO) and Reinforcement Learning (RL) together. The RL agent learns to track the reference motion generated by TO in the environments with rough terrains. To enable the learning of compliant landing skills on challenging terrains, a reward relaxation strategy is synthesized to encourage exploration during landing recovery period. Extensive experiments validate the accurate tracking and safe landing skills benefiting from our proposed method in various scenarios.

NeurIPS Conference 2023 Conference Paper

Adaptive Online Replanning with Diffusion Models

  • Siyuan Zhou
  • Yilun Du
  • Shun Zhang
  • Mengdi Xu
  • Yikang Shen
  • Wei Xiao
  • Dit-Yan Yeung
  • Chuang Gan

Diffusion models have risen a promising approach to data-driven planning, and have demonstrated impressive robotic control, reinforcement learning, and video planning performance. Given an effective planner, an important question to consider is replanning -- when given plans should be regenerated due to both action execution error and external environment changes. Direct plan execution, without replanning, is problematic as errors from individual actions rapidly accumulate and environments are partially observable and stochastic. Simultaneously, replanning at each timestep incurs a substantial computational cost, and may prevent successful task execution, as different generated plans prevent consistent progress to any particular goal. In this paper, we explore how we may effectively replan with diffusion models. We propose a principled approach to determine when to replan, based on the diffusion model's estimated likelihood of existing generated plans. We further present an approach to replan existing trajectories to ensure that new plans follow the same goal state as the original trajectory, which may efficiently bootstrap off previously generated plans. We illustrate how a combination of our proposed additions significantly improves the performance of diffusion planners leading to 38\% gains over past diffusion planning approaches on Maze2D and further enables handling of stochastic and long-horizon robotic control tasks.

NeurIPS Conference 2023 Conference Paper

Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning

  • Mathias Lechner
  • lianhao yin
  • Tim Seyde
  • Tsun-Hsuan Johnson Wang
  • Wei Xiao
  • Ramin Hasani
  • Joshua Rountree
  • Daniela Rus

Multi-agent reinforcement learning (MARL) research is faced with a trade-off: it either uses complex environments requiring large compute resources, which makes it inaccessible to researchers with limited resources, or relies on simpler dynamics for faster execution, which makes the transferability of the results to more realistic tasks challenging. Motivated by these challenges, we present Gigastep, a fully vectorizable, MARL environment implemented in JAX, capable of executing up to one billion environment steps per second on consumer-grade hardware. Its design allows for comprehensive MARL experimentation, including a complex, high-dimensional space defined by 3D dynamics, stochasticity, and partial observations. Gigastep supports both collaborative and adversarial tasks, continuous and discrete action spaces, and provides RGB image and feature vector observations, allowing the evaluation of a wide range of MARL algorithms. We validate Gigastep's usability through an extensive set of experiments, underscoring its role in widening participation and promoting inclusivity in the MARL research community.