基于ConvGRU和注意力特征融合的人体动作识别 Human motion recognition based on ConvGRU and attention feature fusion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于ConvGRU和注意力特征融合的人体动作识别

引用本文：	程娜娜,张荣芬,刘宇红,刘源,刘昕斐,杨双.基于ConvGRU和注意力特征融合的人体动作识别[J].光电子．激光,2023,34(12):1298-1306.

作者姓名：	程娜娜张荣芬刘宇红刘源刘昕斐杨双

作者单位：	贵州大学大数据与信息工程学院，贵州贵阳 550025,贵州大学大数据与信息工程学院，贵州贵阳 550025,贵州大学大数据与信息工程学院，贵州贵阳 550025,贵州大学大数据与信息工程学院，贵州贵阳 550025,贵州大学大数据与信息工程学院，贵州贵阳 550025,贵州大学大数据与信息工程学院，贵州贵阳 550025

基金项目：	贵州省科学技术基金(黔科合基础-ZK[2021]重点 001)资助项目

摘要：	在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。
关键词：	动作识别注意力机制 ConvGRU 特征融合
收稿时间：	2023/3/21 0:00:00
修稿时间：	2023/6/15 0:00:00
Human motion recognition based on ConvGRU and attention feature fusion

CHENG Nan,ZHANG Rongfen,LIU Yuhong,LIU Yuan,LIU Xingfei and YANG Shuang.Human motion recognition based on ConvGRU and attention feature fusion[J].Journal of Optoelectronics·laser,2023,34(12):1298-1306.

Authors:	CHENG Nan ZHANG Rongfen LIU Yuhong LIU Yuan LIU Xingfei and YANG Shuang

Affiliation:	College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China,College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China and College of Big Data and Information Engineering, Guizhou University, Guiyang, Guizhou 550025, China

Abstract:	In the action recognition task,how to fully learn and utilize the correlation between the spatial features and temporal features of the video is particularly important for the final recognition results.Aiming at the problem that the traditional action recognition method ignores the correlation of spatio-temporal features and small features,which leads to the decrease of recognition accuracy,this paper proposes a human action recognition method based on convolutional GRU (ConvGRU) and attentional feature fusion (AFF).Firstly,the Xception network is used to obtain the spatial feature extraction network of video frames,and the spatio-temporal excitation (STE) module and channel excitation (CE) module are introduced to obtain the spatial features and strengthen the modeling ability of temporal actions.In addition,the traditional long short term memory (LSTM) network is replaced by the ConvGRU network,which uses convolution to further mine the spatial features of video frames while extracting temporal features.Finally,the output classifier is improved,and the feature fusion module based on improved multi-scale channel attention is introduced to strengthen the recognition ability of small features and improve the accuracy of the model.The experimental results show that the recognition accuracy of 95.66 % and 69.82 % are achieved on the UCF101 dataset and the HMDB51 dataset,respectively.The algorithm obtains more complete spatio-temporal features and is superior to the current mainstream models.

Keywords:	action recognition attention mechanism ConvGRU feature fusion

	点击此处可从《光电子．激光》浏览原始摘要信息
	点击此处可从《光电子．激光》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏