首页 | 官方网站   微博 | 高级检索  
     

样本有限关联值递归Q学习算法及其收敛性证明
引用本文:殷苌茗,陈焕文,谢丽娟.样本有限关联值递归Q学习算法及其收敛性证明[J].计算机研究与发展,2002,39(9):1064-1070.
作者姓名:殷苌茗  陈焕文  谢丽娟
作者单位:长沙电力学院数学与计算机系,长沙,410077
基金项目:国家自然科学基金项目资助 ( 6 0 0 75 0 19)
摘    要:一个激励学习Agent通过学习一个从状态到动作映射的最优策略来解决问题,求解最优决策一般有两种途径:一种是求最大奖赏方法,另一种最求最优费用方法,利用求解最优费用函数的方法给出了一种新的Q学习算法,Q学习算法是求解信息不完全Markov决策问题的一种有效激励学习方法。Watkins提出了Q学习的基本算法,尽管他证明了在满足一定条件下Q值学习的迭代公式的收敛性,但是在他给出的算法中,没有考虑到在迭代过程中初始状态与初始动作的选取对后继学习的影响,因此提出的关联值递归Q学习算法改进了原来的Q学习算法,并且这种算法有比较好的收敛性质,从求解最优费用函数的方法出发,给出了Q学习的关联值递归算法,这种方法的建立可以使得动态规划(DP)算法中的许多结论直接应用到Q学习的研究中来。

关 键 词:关联值递归  Q学习算法  收敛性证明  激励学习  最优费用函数  Markov决策过程  人工智能

A RELATIVE VALUE ITERATION Q-LEARNING ALGORITHM AND ITS CONVERGENCE BASED-ON FINITE SAMPLES
YIN Chang-Ming,CHEN Huan-Wen,and XIE Li-Juan.A RELATIVE VALUE ITERATION Q-LEARNING ALGORITHM AND ITS CONVERGENCE BASED-ON FINITE SAMPLES[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
Authors:YIN Chang-Ming  CHEN Huan-Wen  and XIE Li-Juan
Abstract:A reinforcement learning agent solves its decision problems by learning optimal decision mapping from a state to an action. There are two approaches generally to solving optimal decision, the one for maximum reward, the other for optimal cost. This paper is concerned with the problem of a novel Q-learning algorithm for solving optimal cost function. Q-learning is a reinforcement learning method to solve Markovian decision problems with incomplete information. In this paper, beginning with solving optimality cost function, the relative value iteration Q-learning algorithm is proposed. It can make many results of the dynamics programming algorithm available for studying Q-learning directly.
Keywords:reinforcement learning  Q-learning  optimality function  relative value iteration  Markovian decision process
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号