首页 | 官方网站   微博 | 高级检索  
     

基于深度卷积长短时神经网络的视频帧预测
引用本文:张德正,翁理国,夏旻,曹辉.基于深度卷积长短时神经网络的视频帧预测[J].计算机应用,2019,39(6):1657-1662.
作者姓名:张德正  翁理国  夏旻  曹辉
作者单位:江苏省大气环境与装备技术协同创新中心(南京信息工程大学),南京,210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学),南京,210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学),南京,210044;江苏省大气环境与装备技术协同创新中心(南京信息工程大学),南京,210044
基金项目:国家自然科学基金资助项目(61503192,61773219);江苏省自然科学基金资助项目(BK20161533);江苏省青蓝工程。
摘    要:针对视频帧预测中难以准确预测空间结构信息细节的问题,通过对卷积长短时记忆(LSTM)神经网络的改进,提出了一种深度卷积长短时神经网络的方法。首先,将输入序列图像输入到两个不同通道的深度卷积LSTM网络组成的编码网络中,由编码网络学习输入序列图像的位置信息变化特征和空间结构信息变化特征;然后,将学习到的变化特征输入到与编码网络通道数对应的解码网络中,由解码网络输出预测的下一张图;最后,将这张图输入回解码网络中,预测接下来的一张图,循环预先设定的次后输出全部的预测图。与卷积LSTM神经网络相比,在Moving-MNIST数据集上的实验中,相同训练步数下所提方法不仅保留了位置信息预测准确的特点,而且空间结构信息细节表征能力更强。同时,将卷积门控循环单元(GRU)神经网络的卷积层加深后,该方法在空间结构信息细节表征上也取得了提升,检验了该方法思想的通用性。

关 键 词:视频帧预测  卷积神经网络  长短时记忆神经网络  编码预测  卷积门控循环单元
收稿时间:2018-12-26
修稿时间:2019-03-17

Video frame prediction based on deep convolutional long short-term memory neural network
ZHANG Dezheng,WENG Liguo,XIA Min,CAO Hui.Video frame prediction based on deep convolutional long short-term memory neural network[J].journal of Computer Applications,2019,39(6):1657-1662.
Authors:ZHANG Dezheng  WENG Liguo  XIA Min  CAO Hui
Affiliation:Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(Nanjing University of Information Science & Technology), Nanjing Jiangsu 210044, China
Abstract:Concerning the difficulty in accurately predicting the spatial structure information details in video frame prediction, a method of deep convolutional Long Short Term Memory (LSTM) neural network was proposed by the improvement of the convolutional LSTM neural network. Firstly, the input sequence images were input into the coding network composed of two deep convolutional LSTM of different channels, and the position information change features and the spatial structure information change features of the input sequence images were learned by the coding network. Then, the learned change features were input into the decoding network corresponding to the coding network channel, and the next predicted picture was output by the decoding network. Finally, the picture was input back to the decoding network, and the next picture was predicted, and all the predicted pictures were output after the pre-set loop times. In the experiments on Moving-MNIST dataset, compared with the convolutional LSTM neural network, the proposed method preserved the accuracy of position information prediction, and had stronger spatial structure information detail representation ability with the same training steps. With the convolutional layer of the convolutional Gated Recurrent Unit (GRU) deepened, the method improved the details of the spatial structure information, verifying the versatility of the idea of the proposed method.
Keywords:video frame prediction  Convolutional Neural Network (CNN)  Long and Short-Term Memory (LSTM) neural network  encoding prediction  convolutional Gated Recurrent Unit (GRU)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号