首页 | 官方网站   微博 | 高级检索  
     

基于深度强化学习的二连杆机械臂运动控制方法
引用本文:王建平,王刚,毛晓彬,马恩琪.基于深度强化学习的二连杆机械臂运动控制方法[J].计算机应用,2021,41(6):1799-1804.
作者姓名:王建平  王刚  毛晓彬  马恩琪
作者单位:西安理工大学 机械与精密仪器工程学院, 西安 710048
摘    要:针对二连杆机械臂的运动控制问题,提出了一种基于深度强化学习的控制方法。首先,搭建机械臂仿真环境,包括二连杆机械臂、目标物与障碍物;然后,根据环境模型的目标设置、状态变量和奖罚机制来建立三种深度强化学习模型进行训练,最后实现二连杆机械臂的运动控制。对比分析所提出的三种模型后,选择深度确定性策略梯度(DDPG)算法进行进一步研究来改进其适用性,从而缩短机械臂模型的调试时间,顺利避开障碍物到达目标。实验结果表明,所提深度强化学习方法能够有效控制二连杆机械臂的运动,改进后的DDPG算法控制模型的收敛速度提升了两倍并且收敛后的稳定性增强。相较于传统控制方法,所提深度强化学习控制方法效率更高,适用性更强。

关 键 词:深度强化学习  二连杆机械臂  运动控制  奖罚机制  深度确定性策略梯度算法  
收稿时间:2020-09-11
修稿时间:2020-12-15

Motion control method of two-link manipulator based on deep reinforcement learning
WANG Jianping,WANG Gang,MAO Xiaobin,MA Enqi.Motion control method of two-link manipulator based on deep reinforcement learning[J].journal of Computer Applications,2021,41(6):1799-1804.
Authors:WANG Jianping  WANG Gang  MAO Xiaobin  MA Enqi
Affiliation:School of Mechanical and Precision Instrument Engineering, Xi'an University of Technology, Xi'an Shaanxi 710048, China
Abstract:Aiming at the motion control problem of two-link manipulator, a new control method based on deep reinforcement learning was proposed. Firstly, the simulation environment of manipulator was built, which includes the two-link manipulator, target and obstacle. Then, according to the target setting, state variables as well as reward and punishment mechanism of the environment model, three kinds of deep reinforcement learning models were established for training. Finally, the motion control of the two-link manipulator was realized. After comparing and analyzing the three proposed models, Deep Deterministic Policy Gradient (DDPG) algorithm was selected for further research to improve its applicability, so as to shorten the debugging time of the manipulator model, and avoided the obstacle to reach the target smoothly. Experimental results show that, the proposed deep reinforcement learning method can effectively control the motion of two-link manipulator, the improved DDPG algorithm control model has the convergence speed increased by two times and the stability after convergence enhances. Compared with the traditional control method, the proposed deep reinforcement learning control method has higher efficiency and stronger applicability.
Keywords:deep reinforcement learning  two-link manipulator  motion control  reward and punishment mechanism  Deep Deterministic Policy Gradient (DDPG) algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号