首页 | 官方网站   微博 | 高级检索  
     

移动机器人路径规划强化学习的初始化
引用本文:宋勇,李贻斌,李彩虹. 移动机器人路径规划强化学习的初始化[J]. 控制理论与应用, 2012, 29(12): 1623-1628
作者姓名:宋勇  李贻斌  李彩虹
作者单位:1. 山东大学控制科学与工程学院,山东济南250061;山东大学(威海)机电与信息工程学院,山东威海264209
2. 山东大学控制科学与工程学院,山东济南,250061
3. 山东理工大学计算机科学与技术学院,山东淄博,255012
基金项目:国家自然科学基金资助项目(61075091, 61174054); 国家自然科学基金青年基金资助项目(61105100).
摘    要:针对现有机器人路径规划强化学习算法收敛速度慢的问题,提出了一种基于人工势能场的移动机器人强化学习初始化方法.将机器人工作环境虚拟化为一个人工势能场,利用先验知识确定场中每点的势能值,它代表最优策略可获得的最大累积回报.例如障碍物区域势能值为零,目标点的势能值为全局最大.然后定义Q初始值为当前点的立即回报加上后继点的最大折算累积回报.改进算法通过Q值初始化,使得学习过程收敛速度更快,收敛过程更稳定.最后利用机器人在栅格地图中的路径对所提出的改进算法进行验证,结果表明该方法提高了初始阶段的学习效率,改善了算法性能.

关 键 词:移动机器人  强化学习  人工势能场  路径规划  Q值初始化
收稿时间:2011-10-17
修稿时间:2012-07-21

Initialization in reinforcement learning for mobile robots path planning
SONG Yong,LI Yi-bin and LI Cai-hong. Initialization in reinforcement learning for mobile robots path planning[J]. Control Theory & Applications, 2012, 29(12): 1623-1628
Authors:SONG Yong  LI Yi-bin  LI Cai-hong
Affiliation:School of Control Science and Engineering, Shandong University; School of Mechanical, Electrical & Information Engineering, Shandong University at Weihai,School of Control Science and Engineering, Shandong University,School of Computer Science and Technology, Shandong University of Technology
Abstract:To improve the convergence rate of the standard Q-learning algorithm, we propose an initialization method for the reinforcement learning of the mobile robot, based on the artificial potential field (APF) -a virtue field of the robot workspace. The potential energy of each point in the field is specified based on prior knowledge, which represents the maximum cumulative reward by following the optimal path policy. In APF, points corresponding to obstacles have null potential energy; the objective point has the global maximum potential energy in the workspace. The initial Q value is defined as the immediate reward at the current point plus the maximum cumulative reward at succeeding points by following the optimal path policy. By initializing the Q value, we find that the improved algorithm converges more rapidly and steadily than the original algorithm. The proposed algorithm is validated by the robot path in the grid workspace. Results of experiments show that the improved algorithm promotes the learning efficiency in the early stage of learning, and improves the performance.
Keywords:mobile robots   reinforcement learning   artificial potential field   path planning   Q values initialization
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《控制理论与应用》浏览原始摘要信息
点击此处可从《控制理论与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号