基于RGB和关节点数据融合模型的双人交互行为识别 Human interaction recognition based on RGB and skeleton data fusion model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于RGB和关节点数据融合模型的双人交互行为识别

引用本文：	姬晓飞,秦琳琳,王扬扬. 基于RGB和关节点数据融合模型的双人交互行为识别[J]. 计算机应用, 2019, 39(11): 3349-3354. DOI: 10.11772/j.issn.1001-9081.2019040633

作者姓名：	姬晓飞秦琳琳王扬扬

作者单位：	沈阳航空航天大学自动化学院, 沈阳 110168

基金项目：	国家自然科学基金资助项目（61602321）；辽宁省教育厅科学研究服务地方项目（L201708）；辽宁省教育厅科学研究青年项目（L201745）。

摘要：	基于RGB视频序列的双人交互行为识别已经取得了重大进展，但因缺乏深度信息，对于复杂的交互动作识别不够准确。深度传感器（如微软Kinect）能够有效提高全身各关节点的跟踪精度，得到准确的人体运动及变化的三维关节点数据。依据RGB视频和关节点数据的各自特性，提出一种基于RGB和关节点数据双流信息融合的卷积神经网络（CNN）结构模型。首先，利用Vibe算法获得RGB视频在时间域的感兴趣区域，之后提取关键帧映射到RGB空间，以得到表示视频信息的时空图，并把图送入CNN提取特征；然后，在每帧关节点序列中构建矢量，以提取余弦距离（CD）和归一化幅值（NM）特征，将单帧中的余弦距离和关节点特征按照关节点序列的时间顺序连接，馈送入CNN学习更高级的时序特征；最后，将两种信息源的softmax识别概率矩阵进行融合，得到最终的识别结果。实验结果表明，将RGB视频信息和关节点信息结合可以有效地提高双人交互行为识别结果，在国际公开的SBU Kinect interaction数据库和NTU RGB+D数据库中分别达到92.55%和80.09%的识别率，证明了提出的模型对双人交互行为识别的有效性。
关键词：	RGB视频关节点数据卷积神经网路 softmax 融合双人交互行为识别
收稿时间：	2019-04-15
修稿时间：	2019-07-26
Human interaction recognition based on RGB and skeleton data fusion model

JI Xiaofei,QIN Linlin,WANG Yangyang. Human interaction recognition based on RGB and skeleton data fusion model[J]. Journal of Computer Applications, 2019, 39(11): 3349-3354. DOI: 10.11772/j.issn.1001-9081.2019040633

Authors:	JI Xiaofei QIN Linlin WANG Yangyang

Affiliation:	College of Automation, Shenyang Aerospace University, Shenyang Liaoning 110136, China

Abstract:	In recent years, significant progress has been made in human interaction recognition based on RGB video sequences. Due to its lack of depth information, it cannot obtain accurate recognition results for complex interactions. The depth sensors (such as Microsoft Kinect) can effectively improve the tracking accuracy of the joint points of the whole body and obtain three-dimensional data that can accurately track the movement and changes of the human body. According to the respective characteristics of RGB and joint point data, a convolutional neural network structure model based on RGB and joint point data dual-stream information fusion was proposed. Firstly, the region of interest of the RGB video in the time domain was obtained by using the Vibe algorithm, and the key frames were extracted and mapped to the RGB space to obtain the spatial-temporal map representing the video information. The map was sent to the convolutional neural network to extract features. Then, a vector was constructed in each frame of the joint point sequence to extract the Cosine Distance (CD) and Normalized Magnitude (NM) features. The cosine distance and the characteristics of the joint nodes in each frame were connected in time order of the joint point sequence, and were fed into the convolutional neural network to learn more advanced temporal features. Finally, the softmax recognition probability matrixes of the two information sources were fused to obtain the final recognition result. The experimental results show that combining RGB video information with joint point information can effectively improve the recognition result of human interaction behavior, and achieves 92.55% and 80.09% recognition rate on the international public SBU Kinect interaction database and NTU RGB+D database respectively, verifying the effectiveness of the proposed model for the identification of interaction behaviour between two people.

Keywords:	RGB video skeleton data Convolutional Neural Network (CNN) softmax fusion human interaction recognition

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏