基于双流-非局部时空残差卷积神经网络的人体行为识别 Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双流-非局部时空残差卷积神经网络的人体行为识别

引用本文：	钱惠敏, 陈实, 皇甫晓瑛. 基于双流-非局部时空残差卷积神经网络的人体行为识别[J]. 电子与信息学报, 2024, 46(3): 1100-1108. doi: 10.11999/JEIT230168

作者姓名：	钱惠敏陈实皇甫晓瑛

作者单位：	河海大学南京 211100

摘要：	3维卷积神经网络(3D CNN)与双流卷积神经网络(two-stream CNN)是视频中人体行为识别研究的常用架构，且各有优势。该文旨在研究结合两种架构且复杂度低、识别精度高的人体行为识别模型。具体地，该文提出基于通道剪枝的双流-非局部时空残差卷积神经网络(TPNLST-ResCNN)，该网络采用双流架构，分别在时间流子网络和空间流子网络采用时空残差卷积神经网络(ST-ResCNN)，并采用均值融合算法融合两个子网络的识别结果。进一步地，为了降低网络的复杂度，该文提出了针对时空残差卷积神经网络的通道剪枝方案，在实现模型压缩的同时，可基本保持模型的识别精度；为了使得压缩后网络能更好地学习到输入视频中人体行为变化的长距离时空依赖关系，提高网络的识别精度，该文提出在剪枝后网络的首个残差型时空卷积块前引入一个非局部模块。实验结果表明，该文提出的人体行为识别模型在公共数据集UCF101和HMDB51上的识别准确率分别为98.33%和74.63%。与现有方法相比，该文模型具有参数量小、识别精度高的优点。
关键词：	人体行为识别双流卷积神经网络 3维卷积神经网络网络剪枝非局部模块
收稿时间：	2023-03-16
修稿时间：	2023-07-05
Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network

QIAN Huimin, CHEN Shi, HUANGFU Xiaoying. Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network[J]. Journal of Electronics & Information Technology, 2024, 46(3): 1100-1108. doi: 10.11999/JEIT230168

Authors:	QIAN Huimin CHEN Shi HUANGFU Xiaoying

Affiliation:	Hohai University, Nanjing 211100, China

Abstract:	Three-Dimensional Convolution Neural Network (3D CNN) and two-stream Convolution Neural Network (two-stream CNN) are commonly-used for human activities recognition, and each has its own advantages. A human activities recognition model with low complexity and high recognition accuracy is proposed by combining the two architectures. Specifically, a Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network based onchannel Pruning (TPNLST-ResCNN) is proposed in this paper. And Spatial Temporal Residual Convolution Neural Networks (ST-ResCNN) are used both in the temporal stream subnetwork and the spatial stream subnetwork. The final recognition results are acquired by fusing the recognition results of the two subnetworks under a mean fusion algorithm. Furthermore, in order to reduce the complexity of the network, a channel pruning scheme for ST-ResCNN is presented to achieve model compression. In order to enable the compressed network to learn the long-distance spatiotemporal dependencies of human activity changes better and improve the recognition accuracy of the network, a nonlocal block is introduced before the first residual spatial temporal convolution block of the pruned network. The experimental results show that the recognition accuracies of the proposed human activity recognition model are 98.33% and 74.63% on the public dataset UCF101 and HMDB51, respectively. Compared with the existed algorithms, the proposed model in this paper has fewer parameters and higher recognition accuracy.

Keywords:	Human activities recognition Two-stream Convolution Neural Network(two-stream CNN) 3D Convolution Neural Network(3D CNN) Network pruning NonLocal block

	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏