基于状态回溯代价分析的启发式Q学习 Heuristically Accelerated State Backtracking <i>Q</i>-Learning Based on Cost Analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于状态回溯代价分析的启发式Q学习

引用本文：	方敏,李浩.基于状态回溯代价分析的启发式Q学习[J].模式识别与人工智能,2013,26(9):838-844.

作者姓名：	方敏李浩

作者单位：	西安电子科技大学计算机学院西安710071

基金项目：	国家自然科学基金项目(No.61070143，61101248)、中央高校基本科研业务费项目(No.K5051203003)资助

摘要：	由于强化学习算法动作策略学习比较费时，提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态，通过比较状态回溯过程中重复动作的选择策略，引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时，基于代价函数计算动作选择的代价以减少不必要的探索，从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景，将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价，有效提高Q学习的收敛速度.
关键词：	代价分析启发函数状态回溯 Q学习
收稿时间：	2012-08-13
Heuristically Accelerated State Backtracking Q-Learning Based on Cost Analysis

FANG Min,LI Hao.Heuristically Accelerated State Backtracking Q-Learning Based on Cost Analysis[J].Pattern Recognition and Artificial Intelligence,2013,26(9):838-844.

Authors:	FANG Min LI Hao

Affiliation:	School of Computer Science and Technology,Xidian University,Xi′an 710071

Abstract:	Since action strategy learning is time-consuming for the reinforcement learning algorithm,a heuristic reinforcement learning algorithm is presented based on state backtracking. By analyzing the repetitive states and comparing the action policies of the reinforcement learning,a cost function is defined to indicate the importance of repetitive actions. A probability-based heuristic function is presented by combining an action reward with an action cost. The proposed algorithm reinforces the importance of an action to speed up learning by the heuristic function and measures the feasibility of an action to reduce unnecessary exploration by the cost function at the same time,thus the learning efficiency is steadily improve. This cost-based action strategy is proved to be reasonable. Two simulation scenarios are built and the experimental results of robot games prove that the proposed algorithm can learn by the tradeoff between rewards and costs,and effectively improve the convergence of Q-learning.

Keywords:	Cost Analysis Heuristic Function State Backtracking Q-Learning

	点击此处可从《模式识别与人工智能》浏览原始摘要信息
	点击此处可从《模式识别与人工智能》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏