首页 | 官方网站   微博 | 高级检索  
     

基于注意力融合网络的视频超分辨率重建
引用本文:卞鹏程,郑忠龙,李明禄,何依然,王天翔,张大伟,陈丽媛.基于注意力融合网络的视频超分辨率重建[J].计算机应用,2021,41(4):1012-1019.
作者姓名:卞鹏程  郑忠龙  李明禄  何依然  王天翔  张大伟  陈丽媛
作者单位:浙江师范大学 数学与计算机科学学院, 浙江 金华 321004
基金项目:国家自然科学基金资助项目;浙江省自然科学基金资助项目
摘    要:基于深度学习的视频超分辨率方法主要关注视频帧内和帧间的时空关系,但以往的方法在视频帧的特征对齐和融合方面存在运动信息估计不精确、特征融合不充分等问题。针对这些问题,采用反向投影原理并结合多种注意力机制和融合策略构建了一个基于注意力融合网络(AFN)的视频超分辨率模型。首先,在特征提取阶段,为了处理相邻帧和参考帧之间的多种运动,采用反向投影结构来获取运动信息的误差反馈;然后,使用时间、空间和通道注意力融合模块来进行多维度的特征挖掘和融合;最后,在重建阶段,将得到的高维特征经过卷积重建出高分辨率的视频帧。通过学习视频帧内和帧间特征的不同权重,充分挖掘了视频帧之间的相关关系,并利用迭代网络结构采取渐进的方式由粗到精地处理提取到的特征。在两个公开的基准数据集上的实验结果表明,AFN能够有效处理包含多种运动和遮挡的视频,与一些主流方法相比在量化指标上提升较大,如对于4倍重建任务,AFN产生的视频帧的峰值信噪比(PSNR)在Vid4数据集上比帧循环视频超分辨率网络(FRVSR)产生的视频帧的PSNR提高了13.2%,在SPMCS数据集上比动态上采样滤波视频超分辨率网络(VSR-DUF)产生的视频帧的PSNR提高了15.3%。

关 键 词:超分辨率  注意力机制  特征融合  反向投影  视频重建  
收稿时间:2020-08-24
修稿时间:2020-09-18

Attention fusion network based video super-resolution reconstruction
BIAN Pengcheng,ZHENG Zhonglong,LI Minglu,HE Yiran,WANG Tianxiang,ZHANG Dawei,CHEN Liyuan.Attention fusion network based video super-resolution reconstruction[J].journal of Computer Applications,2021,41(4):1012-1019.
Authors:BIAN Pengcheng  ZHENG Zhonglong  LI Minglu  HE Yiran  WANG Tianxiang  ZHANG Dawei  CHEN Liyuan
Affiliation:College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua Zhejiang 321004, China
Abstract:Video super-resolution methods based on deep learning mainly focus on the inter-frame and intra-frame spatio-temporal relationships in the video, but previous methods have many shortcomings in the feature alignment and fusion of video frames, such as inaccurate motion information estimation and insufficient feature fusion. Aiming at these problems, a video super-resolution model based on Attention Fusion Network(AFN) was constructed with the use of the back-projection principle and the combination of multiple attention mechanisms and fusion strategies. Firstly, at the feature extraction stage, in order to deal with multiple motions between neighbor frames and reference frame, the back-projection architecture was used to obtain the error feedback of motion information. Then, a temporal, spatial and channel attention fusion module was used to perform the multi-dimensional feature mining and fusion. Finally, at the reconstruction stage, the obtained high-dimensional features were convoluted to reconstruct high-resolution video frames. By learning different weights of features within and between video frames, the correlations between video frames were fully explored, and an iterative network structure was adopted to process the extracted features gradually from coarse to fine. Experimental results on two public benchmark datasets show that AFN can effectively process videos with multiple motions and occlusions, and achieves significant improvements in quantitative indicators compared to some mainstream methods. For instance, for 4-times reconstruction task, the Peak Signal-to-Noise Ratio(PSNR) of the frame reconstructed by AFN is 13.2% higher than that of Frame Recurrent Video Super-Resolution network(FRVSR) on Vid4 dataset and 15.3% higher than that of Video Super-Resolution network using Dynamic Upsampling Filter(VSR-DUF) on SPMCS dataset.
Keywords:super-resolution  attention mechanism  feature fusion  back-projection  video reconstruction  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号