首页 | 官方网站   微博 | 高级检索  
     

基于ConvGRU和注意力特征融合的人体动作识别
引用本文:程娜娜,张荣芬,刘宇红,刘源,刘昕斐,杨双.基于ConvGRU和注意力特征融合的人体动作识别[J].光电子.激光,2023,34(12):1298-1306.
作者姓名:程娜娜  张荣芬  刘宇红  刘源  刘昕斐  杨双
作者单位:贵州大学 大数据与信息工程学院,贵州 贵阳 550025,贵州大学 大数据与信息工程学院,贵州 贵阳 550025,贵州大学 大数据与信息工程学院,贵州 贵阳 550025,贵州大学 大数据与信息工程学院,贵州 贵阳 550025,贵州大学 大数据与信息工程学院,贵州 贵阳 550025,贵州大学 大数据与信息工程学院,贵州 贵阳 550025
基金项目:贵州省科学技术基金(黔科合基础-ZK[2021]重点 001)资助项目
摘    要:在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。

关 键 词:动作识别    注意力机制    ConvGRU    特征融合
收稿时间:2023/3/21 0:00:00
修稿时间:2023/6/15 0:00:00

Human motion recognition based on ConvGRU and attention feature fusion
CHENG Nan,ZHANG Rongfen,LIU Yuhong,LIU Yuan,LIU Xingfei and YANG Shuang.Human motion recognition based on ConvGRU and attention feature fusion[J].Journal of Optoelectronics·laser,2023,34(12):1298-1306.
Authors:CHENG Nan  ZHANG Rongfen  LIU Yuhong  LIU Yuan  LIU Xingfei and YANG Shuang
Affiliation:College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China and College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China
Abstract:In the action recognition task,how to fully learn and utilize the correlation between the spatial features and temporal features of the video is particularly important for the final recognition results.Aiming at the problem that the traditional action recognition method ignores the correlation of spatio-temporal features and small features,which leads to the decrease of recognition accuracy,this paper proposes a human action recognition method based on convolutional GRU (ConvGRU) and attentional feature fusion (AFF).Firstly,the Xception network is used to obtain the spatial feature extraction network of video frames,and the spatio-temporal excitation (STE) module and channel excitation (CE) module are introduced to obtain the spatial features and strengthen the modeling ability of temporal actions.In addition,the traditional long short term memory (LSTM) network is replaced by the ConvGRU network,which uses convolution to further mine the spatial features of video frames while extracting temporal features.Finally,the output classifier is improved,and the feature fusion module based on improved multi-scale channel attention is introduced to strengthen the recognition ability of small features and improve the accuracy of the model.The experimental results show that the recognition accuracy of 95.66 % and 69.82 % are achieved on the UCF101 dataset and the HMDB51 dataset,respectively.The algorithm obtains more complete spatio-temporal features and is superior to the current mainstream models.
Keywords:action recognition  attention mechanism  ConvGRU  feature fusion
点击此处可从《光电子.激光》浏览原始摘要信息
点击此处可从《光电子.激光》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号