首页 | 官方网站   微博 | 高级检索  
     

基于残差梯度法的神经网络Q学习算法
引用本文:司彦娜,普杰信,臧绍飞.基于残差梯度法的神经网络Q学习算法[J].计算机工程与应用,2020,56(18):137-142.
作者姓名:司彦娜  普杰信  臧绍飞
作者单位:河南科技大学 信息工程学院,河南 洛阳 471023
摘    要:针对连续状态空间的非线性系统控制问题,提出一种基于残差梯度法的神经网络Q学习算法。该算法采用多层前馈神经网络逼近Q值函数,同时利用残差梯度法更新神经网络参数以保证收敛性。引入经验回放机制实现神经网络参数的小批量梯度更新,有效减少迭代次数,加快学习速度。为了进一步提高训练过程的稳定性,引入动量优化。此外,采用Softplus函数代替一般的ReLU激活函数,避免了ReLU函数在负数区域值恒为零所导致的某些神经元可能永远无法被激活,相应的权重参数可能永远无法被更新的问题。通过CartPole控制任务的仿真实验,验证了所提算法的正确性和有效性。

关 键 词:Q学习  神经网络  值函数近似  残差梯度法  经验回放  

Neural Network Q Learning Algorithm Based on Residual Gradient Method
SI Yanna,PU Jiexin,ZANG Shaofei.Neural Network Q Learning Algorithm Based on Residual Gradient Method[J].Computer Engineering and Applications,2020,56(18):137-142.
Authors:SI Yanna  PU Jiexin  ZANG Shaofei
Affiliation:School of Information Engineering, Henan University of Science and Technology, Luoyang, Henan 471023, China
Abstract:To solve the control of nonlinear system with continuous state space, a neural network Q learning algorithm based on residual gradient method is proposed. In this algorithm, the multi-layer feedforward neural network is utilized to approximate the Q-value function and the parameters of the neural network are updated by residual gradient method. Moreover, the experience replay mechanism is used to realize the mini-batch gradient update for neural network parameters, which can effectively reduce the number of iterations and increase the learning speed. To improve the stability of the training process further, the momentum optimization method is introduced. In addition, Softplus activation function is selected to replace the commonly used ReLU to avoid the problem that some neurons may never be activated and the corresponding parameters may never be updated due to the zero value of ReLU in negative areas. The simulation results of CartPole control task show the correctness and effectiveness of the proposed algorithm.
Keywords:Q learning  neural network  value function approximation  residual gradient method  experience replay  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号