The control of a two-level Markov decision process by time aggregation |
| |
Authors: | Yat-wah Wan [Author Vitae] [Author Vitae] |
| |
Affiliation: | a Institute of Global Operations Strategy and Logistics Management, National Dong Hwa University, Hualien, Taiwan b Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong |
| |
Abstract: | The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains. |
| |
Keywords: | Time aggregation Markov decision processes Two-level systems Coupled decisions Policy iteration Performance potentials |
本文献已被 ScienceDirect 等数据库收录! |
|