基于深度强化学习的二连杆机械臂运动控制方法 Motion control method of two-link manipulator based on deep reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度强化学习的二连杆机械臂运动控制方法

引用本文：	王建平,王刚,毛晓彬,马恩琪.基于深度强化学习的二连杆机械臂运动控制方法[J].计算机应用,2021,41(6):1799-1804.

作者姓名：	王建平王刚毛晓彬马恩琪

作者单位：	西安理工大学机械与精密仪器工程学院, 西安 710048

摘要：	针对二连杆机械臂的运动控制问题，提出了一种基于深度强化学习的控制方法。首先，搭建机械臂仿真环境，包括二连杆机械臂、目标物与障碍物；然后，根据环境模型的目标设置、状态变量和奖罚机制来建立三种深度强化学习模型进行训练，最后实现二连杆机械臂的运动控制。对比分析所提出的三种模型后，选择深度确定性策略梯度（DDPG）算法进行进一步研究来改进其适用性，从而缩短机械臂模型的调试时间，顺利避开障碍物到达目标。实验结果表明，所提深度强化学习方法能够有效控制二连杆机械臂的运动，改进后的DDPG算法控制模型的收敛速度提升了两倍并且收敛后的稳定性增强。相较于传统控制方法，所提深度强化学习控制方法效率更高，适用性更强。
关键词：	深度强化学习二连杆机械臂运动控制奖罚机制深度确定性策略梯度算法
收稿时间：	2020-09-11
修稿时间：	2020-12-15
Motion control method of two-link manipulator based on deep reinforcement learning

WANG Jianping,WANG Gang,MAO Xiaobin,MA Enqi.Motion control method of two-link manipulator based on deep reinforcement learning[J].journal of Computer Applications,2021,41(6):1799-1804.

Authors:	WANG Jianping WANG Gang MAO Xiaobin MA Enqi

Affiliation:	School of Mechanical and Precision Instrument Engineering, Xi'an University of Technology, Xi'an Shaanxi 710048, China

Abstract:	Aiming at the motion control problem of two-link manipulator, a new control method based on deep reinforcement learning was proposed. Firstly, the simulation environment of manipulator was built, which includes the two-link manipulator, target and obstacle. Then, according to the target setting, state variables as well as reward and punishment mechanism of the environment model, three kinds of deep reinforcement learning models were established for training. Finally, the motion control of the two-link manipulator was realized. After comparing and analyzing the three proposed models, Deep Deterministic Policy Gradient (DDPG) algorithm was selected for further research to improve its applicability, so as to shorten the debugging time of the manipulator model, and avoided the obstacle to reach the target smoothly. Experimental results show that, the proposed deep reinforcement learning method can effectively control the motion of two-link manipulator, the improved DDPG algorithm control model has the convergence speed increased by two times and the stability after convergence enhances. Compared with the traditional control method, the proposed deep reinforcement learning control method has higher efficiency and stronger applicability.

Keywords:	deep reinforcement learning two-link manipulator motion control reward and punishment mechanism Deep Deterministic Policy Gradient (DDPG) algorithm
本文献已被万方数据等数据库收录！
	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏