Least absolute policy iteration for robust value function approximation

Masashi Sugiyama; Hirotaka Hachiya; Hisashi Kashima; Tetsuro Morimura

Back to ICRA

ICRA 2009

Least absolute policy iteration for robust value function approximation

Conference Paper Learning and Adaptive Systems - IV Artificial Intelligence · Robotics

Details

Abstract

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through simulated robot-control tasks.

Authors

Keywords

Function approximation
Robot sensing systems
Computational efficiency
Robotics and automation
Learning
Noise robustness
Legged locomotion
Humanoid robots
Linear programming
Software standards
Value Function
Value Function Approximation
Linear Problem
Standard Software
Absolute Loss
Gaussian Kernel
Negation
State Space
Optimal Policy
Sample Length
Environmental Noise
Linker Length
Markov Decision Process
End-effector
Policy Improvement
Positive Reward
Left Movements
Negative Reward
Sum Of Rewards
Noiseless Case
Noisy Case
Centered Gaussian
Distance Sensor
State-action Value Function
Clear Interpretation
Policy Evaluation
Walking Distance
Minimization Problem

Context

Venue: IEEE International Conference on Robotics and Automation
Archive span: 1984-2025
Indexed papers: 30179
Paper id: 555205440560962312