A Temporal Difference Method for Stochastic Continuous Dynamics

Haruki Settai; Naoya Takeishi; Takehisa Yairi

Back to NeurIPS

NeurIPS 2025

A Temporal Difference Method for Stochastic Continuous Dynamics

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

For continuous systems modeled by dynamical equations such as ODEs and SDEs, Bellman's principle of optimality takes the form of the Hamilton-Jacobi-Bellman (HJB) equation, which provides the theoretical target of reinforcement learning (RL). Although recent advances in RL successfully leverage this formulation, the existing methods typically assume the underlying dynamics are known a priori because they need explicit access to the drift and diffusion coefficients to update the value function following the HJB equation. We address this inherent limitation of HJB-based RL; we propose a model-free approach still targeting the HJB equation and the corresponding temporal difference method. We prove exponential stability of the induced continuous-time dynamics, and we empirically demonstrate the resulting advantages over transition–kernel–based formulations. The proposed formulation paves the way toward bridging stochastic control and model-free reinforcement learning.

A Temporal Difference Method for Stochastic Continuous Dynamics

Abstract

Authors

Keywords

Context