Arrow Research search
Back to AAAI

AAAI 2002

On Policy Iteration as a Newton’s Method and Polynomial Policy Iteration Algorithms

Conference Paper Markov Decision Processes Artificial Intelligence

Abstract

Policy iteration is a popular technique for solving Markov decision processes (MDPs). It is easy to describe and implement, and has excellent performance in practice. But not much is known about its complexity. The best upper bound remains exponential, and the best lower bound is a trivial Ω(n) on the number of iterations, where n is the number of states. This paper improves the upper bounds to a polynomial for policy iteration on MDP problems with special graph structure. Our analysis is based on the connection between policy iteration and Newton’s method for finding the zero of a convex function. The analysis offers an explanation as to why policy iteration is fast. It also leads to polynomial bounds on several variants of policy iteration for MDPs for which the linear programming formulation requires at most two variables per inequality (MDP(2)). The MDP(2) class includes deterministic MDPs under discounted and average reward criteria. The bounds on the run times include O(mn2 log m log W) on MDP(2) and O(mn2 log m) for deterministic MDPs, where m denotes the number of actions and W denotes the magnitude of the largest number in the problem description.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
133046897618921728