Off-dynamics Conditional Diffusion Planners

Wen Zheng Terence Ng; Jianda Chen; Tianwei Zhang 0004

Back to IROS

IROS 2024

Off-dynamics Conditional Diffusion Planners

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Offline Reinforcement Learning (RL) offers an attractive alternative to interactive data acquisition by leveraging pre-existing datasets. However, its effectiveness hinges on the quantity and quality of the data samples. This work explores the use of more readily available, albeit off-dynamics datasets, to address the challenge of data scarcity in Offline RL. We propose a novel approach using conditional Diffusion Probabilistic Models (DPMs) to learn the joint distribution of the large-scale off-dynamics dataset and the limited target dataset. To enable the model to capture the underlying dynamics structure, we introduce two contexts for the conditional model: (1) a continuous dynamics score allows for partial overlap between trajectories from both datasets, providing the model with richer information; (2) an inverse-dynamics context guides the model to generate trajectories that adhere to the target environment’s dynamic constraints. Empirical results demonstrate that our method significantly outperforms several strong baselines. Ablation studies further reveal the critical role of each dynamics context. Additionally, our model demonstrates that by modifying the context, we can interpolate between source and target dynamics, making it more robust to subtle shifts in the environment.

Authors

Keywords

Interpolation
Data acquisition
Reinforcement learning
Fasteners
Diffusion models
Robustness
Data models
Trajectory
Intelligent robots
Context modeling
Continuous Score
Target Dataset
Target Environment
Environmental Shifts
Strong Baseline
Inverse Dynamics
Data Sources
Multilayer Perceptron
Generative Adversarial Networks
Diffusion Model
Latent Space
Real-world Scenarios
Dynamic Information
Availability Of Sources
Inverse Model
Target Domain
Transition Dynamics
Self-driving
Source Domain
Source Dataset
Discrete Labels
Markov Decision Process
Behavior Policy
Planning Horizon
Trajectory Generation
Consecutive States
Joint Training
Unconditional Model

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 591098272546898069