首页 | 官方网站   微博 | 高级检索  
     

基于双流-非局部时空残差卷积神经网络的人体行为识别
引用本文:钱惠敏, 陈实, 皇甫晓瑛. 基于双流-非局部时空残差卷积神经网络的人体行为识别[J]. 电子与信息学报, 2024, 46(3): 1100-1108. doi: 10.11999/JEIT230168
作者姓名:钱惠敏  陈实  皇甫晓瑛
作者单位:河海大学 南京 211100
摘    要:3维卷积神经网络(3D CNN)与双流卷积神经网络(two-stream CNN)是视频中人体行为识别研究的常用架构,且各有优势。该文旨在研究结合两种架构且复杂度低、识别精度高的人体行为识别模型。具体地,该文提出基于通道剪枝的双流-非局部时空残差卷积神经网络(TPNLST-ResCNN),该网络采用双流架构,分别在时间流子网络和空间流子网络采用时空残差卷积神经网络(ST-ResCNN),并采用均值融合算法融合两个子网络的识别结果。进一步地,为了降低网络的复杂度,该文提出了针对时空残差卷积神经网络的通道剪枝方案,在实现模型压缩的同时,可基本保持模型的识别精度;为了使得压缩后网络能更好地学习到输入视频中人体行为变化的长距离时空依赖关系,提高网络的识别精度,该文提出在剪枝后网络的首个残差型时空卷积块前引入一个非局部模块。实验结果表明,该文提出的人体行为识别模型在公共数据集UCF101和HMDB51上的识别准确率分别为98.33%和74.63%。与现有方法相比,该文模型具有参数量小、识别精度高的优点。

关 键 词:人体行为识别   双流卷积神经网络   3维卷积神经网络   网络剪枝   非局部模块
收稿时间:2023-03-16
修稿时间:2023-07-05

Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network
QIAN Huimin, CHEN Shi, HUANGFU Xiaoying. Human Activities Recognition Based on Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network[J]. Journal of Electronics & Information Technology, 2024, 46(3): 1100-1108. doi: 10.11999/JEIT230168
Authors:QIAN Huimin  CHEN Shi  HUANGFU Xiaoying
Affiliation:Hohai University, Nanjing 211100, China
Abstract:Three-Dimensional Convolution Neural Network (3D CNN) and two-stream Convolution Neural Network (two-stream CNN) are commonly-used for human activities recognition, and each has its own advantages. A human activities recognition model with low complexity and high recognition accuracy is proposed by combining the two architectures. Specifically, a Two-stream NonLocal Spatial Temporal Residual Convolution Neural Network based onchannel Pruning (TPNLST-ResCNN) is proposed in this paper. And Spatial Temporal Residual Convolution Neural Networks (ST-ResCNN) are used both in the temporal stream subnetwork and the spatial stream subnetwork. The final recognition results are acquired by fusing the recognition results of the two subnetworks under a mean fusion algorithm. Furthermore, in order to reduce the complexity of the network, a channel pruning scheme for ST-ResCNN is presented to achieve model compression. In order to enable the compressed network to learn the long-distance spatiotemporal dependencies of human activity changes better and improve the recognition accuracy of the network, a nonlocal block is introduced before the first residual spatial temporal convolution block of the pruned network. The experimental results show that the recognition accuracies of the proposed human activity recognition model are 98.33% and 74.63% on the public dataset UCF101 and HMDB51, respectively. Compared with the existed algorithms, the proposed model in this paper has fewer parameters and higher recognition accuracy.
Keywords:Human activities recognition  Two-stream Convolution Neural Network(two-stream CNN)  3D Convolution Neural Network(3D CNN)  Network pruning  NonLocal block
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号