首页 | 官方网站   微博 | 高级检索  
     

基于逐次超松弛技术的Double Speedy Q-Learning算法
引用本文:周琴,罗飞,丁炜超,顾春华,郑帅.基于逐次超松弛技术的Double Speedy Q-Learning算法[J].计算机科学,2022,49(3):239-245.
作者姓名:周琴  罗飞  丁炜超  顾春华  郑帅
作者单位:华东理工大学信息科学与工程学院 上海200237
基金项目:上海汽车工业科技发展基金会产学研课题;国家自然科学基金
摘    要:Q-Learning是目前一种主流的强化学习算法,但其在随机环境中收敛速度不佳,之前的研究针对Speedy Q-Learning存在的过估计问题进行改进,提出了Double Speedy Q-Learning算法。但Double Speedy Q-Learning算法并未考虑随机环境中存在的自循环结构,即代理执行动作时,存在进入当前状态的概率,这将不利于代理在随机环境中学习,从而影响算法的收敛速度。针对Double Speedy Q-Learning中存在的自循环结构,利用逐次超松弛技术对Double Speedy Q-Learning算法的Bellman算子进行改进,提出基于逐次超松弛技术的Double Speedy Q-Learning算法(Double Speedy Q-Learning based on Successive Over Relaxation,DSQL-SOR),进一步提升了Double Speedy Q-Learning算法的收敛速度。通过数值实验将DSQL-SOR与其他算法的实际奖励和期望奖励之间的误差进行对比,实验结果表明,所提算法比现有主流的算法SQL的误差低0.6,比逐次超松弛算法GSQL低0.5,这表明DSQL-SOR算法的性能较其他算法更优。实验同时对DSQL-SOR算法的可拓展性进行测试,当状态空间从10增加到1000时,每次迭代的平均时间增长缓慢,始终维持在10-4数量级上,表明DSQL-SOR的可拓展性较强。

关 键 词:强化学习  Q-LEARNING  马尔可夫决策过程  逐次超松弛迭代法  自循环结构

Double Speedy Q-Learning Based on Successive Over Relaxation
ZHOU Qin,LUO Fei,DING Wei-chao,GU Chun-hua,ZHENG Shuai.Double Speedy Q-Learning Based on Successive Over Relaxation[J].Computer Science,2022,49(3):239-245.
Authors:ZHOU Qin  LUO Fei  DING Wei-chao  GU Chun-hua  ZHENG Shuai
Affiliation:(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
Abstract:Q-Learning is a mainstream reinforcement learning algorithm atpresent,but its convergence speed is poor in random environment.Previous studies have improved the overestimation problem of Spee-dy Q-Learning,and have proposed Double Speedy Q-Learning algorithm.However,the Double Speedy Q-Learning algorithm does not consider the self-loop structure exis-ting in the random environment,that is,the probability of entering the current state when the agent performs an action,which will not be conducive to the agent’s learning in the random environment,thereby affecting the convergence speed of the algorithm.Aiming at the self-loop structure existing in Double Speedy Q-Learning,the Bellman operator of Double Speedy Q-Learning algorithm is improved by using successive over-relaxation technology,and the Double Speedy Q-Learning algorithm based on successive over relaxation(DSQL-SOR)is proposed to further improve the convergence speed of the Double Speedy Q-Learning algorithm.By using numerical experiments to compare the error between the actual rewards and expected rewards of DSQL-SOR and other algorithms,the experimental results show that the proposed algorithm has a lower error of 0.6 than the existing mainstream algorithm SQL,which is lower than the successive over-relaxation algorithm GSQL 0.5,indicating that the performance of the DSQL-SOR algorithm is better than other algorithms.The experiment also tests the scalability of the DSQL-SOR algorithm.When the state space is increased from 10 to 1000,the average time of each iteration increases slowly,always maintaining at the magnitude of 10-4,indicating that DSQL-SOR has strong scalability.
Keywords:Reinforcement learning  Q-Learning  Markov decision process(MDP)  Successive over relaxation(SOR)  Self-loop structure
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号