随着计算机视觉不断发展,人体行为识别在视频监控、视频检索和人机交互等诸多领域中展现出其广泛的应用前景和研究价值。人体行为识别涉及到对图像内容的理解,由于人体姿势复杂多样和背景遮挡的因素导致实际应用的进展缓慢。全面回顾了人体行为识别的发展历程,深入探究了该领域的研究方法,包括传统手工提取特征的方法和基于深度学习的方法,以及最近十分热门的基于图卷积网络(GCN)的方法,并按照所使用的数据类型对这些方法进行了系统的梳理;此外,针对不同的数据类型,分别介绍了一些热门的行为识别数据集,对比分析了各类方法在这些数据集上的性能。最后进行了概括总结,并对未来人体行为识别的研究方向进行了展望。  相似文献   

以基于视频的人体动作识别为核心,首先对传统RGB动作识别领域的算法进行了全面回顾,包括传统算法和基于深度学习的算法,基于RGB视频的动作识别易受背景光照的影响识别精度不高,但有丰富的颜色外观信息;然后对RGB-D动作识别领域的算法进行分析总结,主要分为深度序列、骨骼和多特征融合三个方面,RGB-D视频具有多个模态可以为动作识别提供更多的信息,可以弥补基于RGB视频的不足但也带来了新的挑战;最后对常用数据集和未来可能的发展方向进行了展望。  相似文献   

Ongoing human action recognition is a challenging problem that has many applications, such as video surveillance, patient monitoring, human–computer interaction, etc. This paper presents a novel framework for recognizing streamed actions using Motion Capture (MoCap) data. Unlike the after-the-fact classification of completed activities, this work aims at achieving early recognition of ongoing activities. The proposed method is time efficient as it is based on histograms of action poses, extracted from MoCap data, that are computed according to Hausdorff distance. The histograms are then compared with the Bhattacharyya distance and warped by a dynamic time warping process to achieve their optimal alignment. This process, implemented by our dynamic programming-based solution, has the advantage of allowing some stretching flexibility to accommodate for possible action length changes. We have shown the success and effectiveness of our solution by testing it on large datasets and comparing it with several state-of-the-art methods. In particular, we were able to achieve excellent recognition rates that have outperformed many well known methods.  相似文献   

行为识别是当前计算机视觉方向中视频理解领域的重要研究课题。从视频中准确提取人体动作的特征并识别动作,能为医疗、安防等领域提供重要的信息,是一个十分具有前景的方向。本文从数据驱动的角度出发,全面介绍了行为识别技术的研究发展,对具有代表性的行为识别方法或模型进行了系统阐述。行为识别的数据分为RGB模态数据、深度模态数据、骨骼模态数据以及融合模态数据。首先介绍了行为识别的主要过程和人类行为识别领域不同数据模态的公开数据集;然后根据数据模态分类,回顾了RGB模态、深度模态和骨骼模态下基于传统手工特征和深度学习的行为识别方法,以及多模态融合分类下RGB模态与深度模态融合的方法和其他模态融合的方法。传统手工特征法包括基于时空体积和时空兴趣点的方法(RGB模态)、基于运动变化和外观的方法(深度模态)以及基于骨骼特征的方法(骨骼模态)等;深度学习方法主要涉及卷积网络、图卷积网络和混合网络,重点介绍了其改进点、特点以及模型的创新点。基于不同模态的数据集分类进行不同行为识别技术的对比分析。通过类别内部和类别之间两个角度对比分析后,得出不同模态的优缺点与适用场景、手工特征法与深度学习法的区别和融合多模态的优...  相似文献   

健身动作识别是智能健身系统的核心环节.为了提高健身动作识别算法的精度和速度,并减少健身动作中人体整体位移对识别结果的影响,提出了一种基于人体骨架特征编码的健身动作识别方法.该方法包括三个步骤:首先,构建精简的人体骨架模型,并利用人体姿态估计技术提取骨架模型中各关节点的坐标信息;其次,利用人体中心投影法提取动作特征区域以...  相似文献   

当前常用3维重构的方法表示和计算视频中的人体位姿,但由于这些方法通常需要多个摄像头,不仅限制条件多,且计算复杂度高,为此,提出了一种基于头肩分割的人体位姿估计算法。该算法首先对视频中的人体进行头肩定位;然后利用人体头部的平面成像特点计算头部位姿,同时利用人体肩部的轮廓变化特点计算躯干位姿;最后结合头部和躯干的位姿估计运动中的人体位姿。实验结果证明,该算法是有效和优越的。  相似文献   

深度学习在人物动作识别方面已取得较好的成效,但当前仍然需要充分利用视频中人物的外形信息和运动信息。为利用视频中的空间信息和时间信息来识别人物行为动作,提出一种时空双流视频人物动作识别模型。该模型首先利用两个卷积神经网络分别抽取视频动作片段空间和时间特征,接着融合这两个卷积神经网络并提取中层时空特征,最后将提取的中层特征输入到3D卷积神经网络来完成视频中人物动作的识别。在数据集UCF101和HMDB51上,进行视频人物动作识别实验。实验结果表明,所提出的基于时空双流的3D卷积神经网络模型能够有效地识别视频人物动作。  相似文献   

针对动作识别中如何有效地利用人体运动的三维信息的问题,提出一种新的基于深度视频序列的特征提取和识别方法。该方法首先运用运动能量模型(MEM)来表征人体动态特征,即先将整个深度视频序列投影到三个正交的笛卡儿平面上,再把每个投影面的视频系列划分为能量均等的子时间序列,分别计算子序列的深度运动图能量从而得到运动能量模型(MEM)。然后利用局部二值模式(LBP)描述符对运动能量模型编码,进一步提取人体运动的有效信息。最后用 范数协同表示分类器进行动作分类识别。在MSRAction3D、MSRGesture3D数据库上测试所提方法,实验结果表明该方法有较高的识别效果。  相似文献   

Human–Robot Collaboration is a critical component of Industry 4.0, contributing to a transition towards more flexible production systems that are quickly adjustable to changing production requirements. This paper aims to increase the natural collaboration level of a robotic engine assembly station by proposing a cognitive system powered by computer vision and deep learning to interpret implicit communication cues of the operator. The proposed system, which is based on a residual convolutional neural network with 34 layers and a long-short term memory recurrent neural network (ResNet-34 + LSTM), obtains assembly context through action recognition of the tasks performed by the operator. The assembly context was then integrated in a collaborative assembly plan capable of autonomously commanding the robot tasks. The proposed model showed a great performance, achieving an accuracy of 96.65% and a temporal mean intersection over union (mIoU) of 94.11% for the action recognition of the considered assembly. Moreover, a task-oriented evaluation showed that the proposed cognitive system was able to leverage the performed human action recognition to command the adequate robot actions with near-perfect accuracy. As such, the proposed system was considered as successful at increasing the natural collaboration level of the considered assembly station.  相似文献   

为了提高Android平台下实时人体行为识别方法的性能,提出对动作变化和过渡动作进行检测和分割的方法。该方法采用加速度在重力方向上的投影和水平方向上投影的幅值来表征行为活动,通过趋势判断行为变化,结合趋势突变点检测和DTW算法进行过渡动作分割。提取加速度时域特征,使用随机森林对九种行为进行分类识别,平均识别率达到97.26%,其中过渡动作平均识别率达到95.05%。  相似文献   

A survey on vision-based human action recognition   总被引:10,自引:0,他引:10  
Vision-based human action recognition is the process of labeling image sequences with action labels. Robust solutions to this problem have applications in domains such as visual surveillance, video retrieval and human–computer interaction. The task is challenging due to variations in motion performance, recording settings and inter-personal differences. In this survey, we explicitly address these challenges. We provide a detailed overview of current advances in the field. Image representations and the subsequent classification process are discussed separately to focus on the novelties of recent research. Moreover, we discuss limitations of the state of the art and outline promising directions of research.  相似文献   

Heterogeneous architectures have emerged as mainstream computing platforms due to their suitability to deliver high performance and energy efficiency. To fully realize this potential it is necessary to obtain a good mapping of the computation kernels to processing elements. The best mapping search can be very costly when complex applications presenting different levels of granularity must be evaluated in a heterogeneous computation platform. In this paper we propose a model that employs both the estimated computation time and power consumption of each application kernel to find the best computing configuration for the whole application. As a case study, our approach is applied to the implementation of an irregular algorithm on a heterogeneous embedded architecture, more precisely an algorithm used in computer vision applications like human action or gait recognition. We analyze two parallelization versions: a non-pipelined version and a pipelined one, and we use our approach to obtain the mapping with least energy consumption. Finally, we validate our model comparing the predicted results with the real values obtained for the two implementations of the algorithm.  相似文献   

行为识别(action recognition,AR)是计算机视觉领域的研究热点,在安防监控、自动驾驶、生产安全等领域具有广泛的应用前景。首先,对行为识别的内涵与外延进行了剖析,提出了面临的技术挑战问题。其次,从时间特征提取、高效率优化和长期特征捕获三个角度分析比较了行为识别的工作原理。对近十年43种基准AR方法在UCF101、HMDB51、Something-Something和Kinetics400数据集上的性能表征进行比对,有助于针对不同应用场景选择适合的AR模型。最后指明了行为识别领域的未来发展方向,研究成果可为视频特征提取和视觉内容理解提供理论参考和技术支撑。  相似文献   

王萍  庞文浩 《计算机应用》2019,39(7):2081-2086
针对原始空时双通道卷积神经网络(CNN)模型对长时段复杂视频中行为识别率低的问题,提出了一种基于视频分段的空时双通道卷积神经网络的行为识别方法。首先将视频分成多个等长不重叠的分段,对每个分段随机采样得到代表视频静态特征的帧图像和代表运动特征的堆叠光流图像;然后将这两种图像分别输入到空域和时域卷积神经网络进行特征提取,再在两个通道分别融合各视频分段特征得到空域和时域的类别预测特征;最后集成双通道的预测特征得到视频行为识别结果。通过实验讨论了多种数据增强方法和迁移学习方案以解决训练样本不足导致的过拟合问题,分析了不同分段数、预训练网络、分段特征融合方案和双通道集成策略对行为识别性能的影响。实验结果显示所提模型在UCF101数据集上的行为识别准确率达到91.80%,比原始的双通道模型提高了3.8个百分点;同时在HMDB51数据集上的行为识别准确率也比原模型提高,达到61.39%,这表明所提模型能够更好地学习和表达长时段复杂视频中人体行为特征。  相似文献   

This paper firstly introduces common wearable sensors, smart wearable devices and the key application areas. Since multi-sensor is defined by the presence of more than one model or channel, e.g. visual, audio, environmental and physiological signals. Hence, the fusion methods of multi-modality and multi-location sensors are proposed. Despite it has been contributed several works reviewing the stateoftheart on information fusion or deep learning, all of them only tackled one aspect of the sensor fusion applications, which leads to a lack of comprehensive understanding about it. Therefore, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of the fusion methods of wearable sensors. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of multi-sensor applications for human activity recognition, including those recently added to the field for unsupervised learning and transfer learning. Finally, the open research issues that need further research and improvement are identified and discussed.  相似文献   

In this paper, we propose a hierarchical discriminative approach for human action recognition. It consists of feature extraction with mutual motion pattern analysis and discriminative action modeling in the hierarchical manifold space. Hierarchical Gaussian Process Latent Variable Model (HGPLVM) is employed to learn the hierarchical manifold space in which motion patterns are extracted. A cascade CRF is also presented to estimate the motion patterns in the corresponding manifold subspace, and the trained SVM classifier predicts the action label for the current observation. Using motion capture data, we test our method and evaluate how body parts make effect on human action recognition. The results on our test set of synthetic images are also presented to demonstrate the robustness.  相似文献   

针对现有基于深度学习的人体动作识别模型参数量大、网络过深过重等问题,提出了一种轻量型的双流融合深度神经网络模型并将该模型应用于人体动作识别。该模型将浅层多尺度网络和深度网络相结合,实现了模型参数量的大幅减少,避免了网络过深的问题。在数据集UCF101和HMDB51上进行实验,该模型在ImageNet预训练模式下分别取得了94.0%和69.4%的识别准确率。实验表明,相较于现有大多基于深度学习的人体动作识别模型,该模型大幅减少了参数量,并且仍具有较高的动作识别准确率。  相似文献   

为解决微小动作识别率低的问题,提出一种结合新投影策略和能量均匀化视频分割的多层深度运动图的人体行为识别方法。首先,提出一种新的投影策略,将深度图像投影到三个正交笛卡尔平面,以保留更多的行为信息;其次,基于整个视频的多层深度运动图图像虽然可反映整体运动信息,但却忽略了很多细节,采用基于能量均匀化的视频分割方法,将视频划分为多个子视频序列,可以更加全面地刻画动作细节信息;最后,为描述多层深度运动图图像纹理细节,采用局部二值模式作为动作特征描述子,结合核极端学习机分类器进行动作识别。实验结果表明:在公开动作识别库MSRAction3D和手势识别库MSRGesture3D上,本文算法准确率分别达94.55%和95.67%,与现存许多算法相比,有更高的识别率。  相似文献   

