Deterministic Policy Gradient Algorithms

David Silver 0001; Guy Lever; Nicolas Heess; Thomas Degris; Daan Wierstra; Martin A. Riedmiller

Back to ICML

ICML 2014

Deterministic Policy Gradient Algorithms

Conference Paper Cycle 1 Papers Artificial Intelligence · Machine Learning

Details

Abstract

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: International Conference on Machine Learning
Archive span: 1993-2025
Indexed papers: 16471
Paper id: 571081106173339449