首页 | 官方网站   微博 | 高级检索  
     

基于RGB和关节点数据融合模型的双人交互行为识别
引用本文:姬晓飞,秦琳琳,王扬扬. 基于RGB和关节点数据融合模型的双人交互行为识别[J]. 计算机应用, 2019, 39(11): 3349-3354. DOI: 10.11772/j.issn.1001-9081.2019040633
作者姓名:姬晓飞  秦琳琳  王扬扬
作者单位:沈阳航空航天大学 自动化学院, 沈阳 110168
基金项目:国家自然科学基金资助项目(61602321);辽宁省教育厅科学研究服务地方项目(L201708);辽宁省教育厅科学研究青年项目(L201745)。
摘    要:基于RGB视频序列的双人交互行为识别已经取得了重大进展,但因缺乏深度信息,对于复杂的交互动作识别不够准确。深度传感器(如微软Kinect)能够有效提高全身各关节点的跟踪精度,得到准确的人体运动及变化的三维关节点数据。依据RGB视频和关节点数据的各自特性,提出一种基于RGB和关节点数据双流信息融合的卷积神经网络(CNN)结构模型。首先,利用Vibe算法获得RGB视频在时间域的感兴趣区域,之后提取关键帧映射到RGB空间,以得到表示视频信息的时空图,并把图送入CNN提取特征;然后,在每帧关节点序列中构建矢量,以提取余弦距离(CD)和归一化幅值(NM)特征,将单帧中的余弦距离和关节点特征按照关节点序列的时间顺序连接,馈送入CNN学习更高级的时序特征;最后,将两种信息源的softmax识别概率矩阵进行融合,得到最终的识别结果。实验结果表明,将RGB视频信息和关节点信息结合可以有效地提高双人交互行为识别结果,在国际公开的SBU Kinect interaction数据库和NTU RGB+D数据库中分别达到92.55%和80.09%的识别率,证明了提出的模型对双人交互行为识别的有效性。

关 键 词:RGB视频  关节点数据  卷积神经网路  softmax  融合  双人交互行为识别  
收稿时间:2019-04-15
修稿时间:2019-07-26

Human interaction recognition based on RGB and skeleton data fusion model
JI Xiaofei,QIN Linlin,WANG Yangyang. Human interaction recognition based on RGB and skeleton data fusion model[J]. Journal of Computer Applications, 2019, 39(11): 3349-3354. DOI: 10.11772/j.issn.1001-9081.2019040633
Authors:JI Xiaofei  QIN Linlin  WANG Yangyang
Affiliation:College of Automation, Shenyang Aerospace University, Shenyang Liaoning 110136, China
Abstract:In recent years, significant progress has been made in human interaction recognition based on RGB video sequences. Due to its lack of depth information, it cannot obtain accurate recognition results for complex interactions. The depth sensors (such as Microsoft Kinect) can effectively improve the tracking accuracy of the joint points of the whole body and obtain three-dimensional data that can accurately track the movement and changes of the human body. According to the respective characteristics of RGB and joint point data, a convolutional neural network structure model based on RGB and joint point data dual-stream information fusion was proposed. Firstly, the region of interest of the RGB video in the time domain was obtained by using the Vibe algorithm, and the key frames were extracted and mapped to the RGB space to obtain the spatial-temporal map representing the video information. The map was sent to the convolutional neural network to extract features. Then, a vector was constructed in each frame of the joint point sequence to extract the Cosine Distance (CD) and Normalized Magnitude (NM) features. The cosine distance and the characteristics of the joint nodes in each frame were connected in time order of the joint point sequence, and were fed into the convolutional neural network to learn more advanced temporal features. Finally, the softmax recognition probability matrixes of the two information sources were fused to obtain the final recognition result. The experimental results show that combining RGB video information with joint point information can effectively improve the recognition result of human interaction behavior, and achieves 92.55% and 80.09% recognition rate on the international public SBU Kinect interaction database and NTU RGB+D database respectively, verifying the effectiveness of the proposed model for the identification of interaction behaviour between two people.
Keywords:RGB video  skeleton data  Convolutional Neural Network (CNN)  softmax  fusion  human interaction recognition  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号