首页 | 官方网站   微博 | 高级检索  
     

基于一维卷积循环神经网络的深度强化学习算法
引用本文:畅鑫,李艳斌,田淼,陈苏逸,杜宇峰,赵研.基于一维卷积循环神经网络的深度强化学习算法[J].计算机测量与控制,2022,30(1):258-265.
作者姓名:畅鑫  李艳斌  田淼  陈苏逸  杜宇峰  赵研
作者单位:中国电子科技集团公司第五十四研究所;河北省电磁频谱认知与管控重点实验室,,,,,
摘    要:针对现有深度强化学习算法在状态空间维度大的环境中难以收敛的问题,提出了在时间维度上提取特征的基于一维卷积循环网络的强化学习算法;首先在深度Q网络(DQN,deep Q network)的基础上构建一个深度强化学习系统;然后在深度循环Q网络(DRQN,deep recurrent Q network)的神经网络结构基础上加入了一层一维卷积层,用于在长短时记忆(LSTM,long short-term memory)层之前提取时间维度上的特征;最后在与时序相关的环境下对该新型强化学习算法进行训练和测试;实验结果表明这一改动可以提高智能体的决策水平,并使得深度强化学习算法在非图像输入的时序相关环境中有更好的表现。

关 键 词:强化学习  深度学习  长短时记忆网络  卷积神经网络  深度Q网络
收稿时间:2021/10/9 0:00:00
修稿时间:2021/11/15 0:00:00

Reinforcement Learning Algorithm Based on One-dimensional Convolutional Recurrent Network
CHANG Xin,LI Yanbin,TIAN Miao,CHEN Suyi,DU Yufeng,ZHAO Yan.Reinforcement Learning Algorithm Based on One-dimensional Convolutional Recurrent Network[J].Computer Measurement & Control,2022,30(1):258-265.
Authors:CHANG Xin  LI Yanbin  TIAN Miao  CHEN Suyi  DU Yufeng  ZHAO Yan
Affiliation:(The 54th Research Institute of China Electronics Technology Group Corporation(CETC54),Shijiazhuang 050081,China;Hebei Key Laboratory of Electromagnetic Spectrum Cognition and Control,The 54th Research Institute of China Electronics Technology Group Corporation(CETC54),Shijiazhuang 050081,China;School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
Abstract:Existing deep reinforcement learning algorithms have difficulty converging in environments with large state space dimensions. So a reinforcement learning algorithm based on one-dimensional convolutional recurrent networks that extracts features in the time dimension is proposed. Firstly, a deep reinforcement learning system based on DQN is built. Then a one-dimensional convolutional layer is added into the neural network architecture of DRQN for extracting the features in the time dimension before the LSTM layer. Finally, the new reinforcement learning algorithm is trained and tested in a timing-related environment. The experimental results show that this change can improve the decision-making level of the agent, making deep reinforcement learning algorithms have better performance in non-image input and timing-related environment
Keywords:reinforcement learning  deep learning  LSTM  convolutional neural network  DQN
本文献已被 维普 等数据库收录!
点击此处可从《计算机测量与控制》浏览原始摘要信息
点击此处可从《计算机测量与控制》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号