首页 | 官方网站   微博 | 高级检索  
     


Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited
Affiliation:1. INRIA-Saclay, Palaiseau 91192, France;2. Department of Aeronautics and Astronautics, University of Washington, Seattle, WA 98195, USA;3. Inria Sophia Antipolis, 2004 Route des Lucioles, B.P. 93, 06902 Sophia Antipolis Cedex, France;4. Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005 Paris, France
Abstract:We compare the computational performance of linear programming (LP) and the policy iteration algorithm (PIA) for solving discrete-time infinite-horizon Markov decision process (MDP) models with total expected discounted reward. We use randomly generated test problems as well as a real-life health-care problem to empirically show that, unlike previously reported, barrier methods for LP provide a viable tool for optimally solving such MDPs. The dimensions of comparison include transition probability matrix structure, state and action size, and the LP solution method.
Keywords:Markov decision process  MDP  Linear programming  Policy iteration  Total expected discounted reward  Treatment optimization
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号