首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
2.
Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k‐means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum–Welch re‐estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.  相似文献   

3.
4.
3D skeleton sequences contain more effective and discriminative information than RGB video and are more suitable for human action recognition. Accurate extraction of human skeleton information is the key to the high accuracy of action recognition. Considering the correlation between joint points, in this work, we first propose a skeleton feature extraction method based on complex network. The relationship between human skeleton points in each frame is coded as a network. The changes of action over time are described by a time series network composed of skeleton points. Network topology attributes are used as feature vectors, complex network coding and LSTM are combined to recognize human actions. The method was verified on the NTU RGB + D60, MSR Action3D and UTKinect-Action3D dataset, and have achieved good performance, respectively. It shows that the method of extracting skeleton features based on complex network can properly identify different actions. This method that considers the temporal information and the relationship between skeletons at the same time plays an important role in the accurate recognition of human actions.  相似文献   

5.
刘强  张文英  陈恩庆 《信号处理》2020,36(9):1422-1428
人体动作识别在人机交互、视频内容检索等领域有众多应用,是多媒体信息处理的重要研究方向。现有的大多数基于双流网络进行动作识别的方法都是在双流上使用相同的卷积网络去处理RGB与光流数据,缺乏对多模态信息的利用,容易造成网络冗余和相似性动作误判问题。近年来,深度视频也越来越多地用于动作识别,但是大多数方法只关注了深度视频中动作的空间信息,没有利用时间信息。为了解决这些问题,本文提出一种基于异构多流网络的多模态动作识别方法。该方法首先从深度视频中获取动作的时间特征表示,即深度光流数据,然后选择合适的异构网络来进行动作的时空特征提取与分类,最后对RGB数据、RGB中提取的光流、深度视频和深度光流识别结果进行多模态融合。通过在国际通用的大型动作识别数据集NTU RGB+D上进行的实验表明,所提方法的识别性能要优于现有较先进方法的性能。   相似文献   

6.
在基于视频图像的动作识别中,由于固定视角相机所获取的不同动作视频存在视角差异,会造成识别准确率降低等问题。使用多视角视频图像是提高识别准确率的方法之一,提出基于三维残差网络(3D Residual Network,3D ResNet)和长短时记忆(Long Short-term Memory,LSTM)网络的多视角人体动作识别算法,通过3D ResNet学习各视角动作序列的融合时空特征,利用多层LSTM网络继续学习视频流中的长期活动序列表示并深度挖掘视频帧序列之间的时序信息。在NTU RGB+D 120数据集上的实验结果表明,该模型对多视角视频序列动作识别的准确率可达83.2%。  相似文献   

7.
桑海峰  赵子裕  何大阔 《电子学报》2020,48(6):1052-1061
视频帧中复杂的环境背景、照明条件等与行为无关的视觉信息给行为空间特征带来了大量的冗余和噪声,一定程度上影响了行为识别的准确性.针对这一点,本文提出了一种循环区域关注单元以捕捉空间特征中与行为相关的区域视觉信息,并根据视频的时序特性又提出了循环区域关注模型.其次,本文又提出了一种能够突显整段行为视频序列中较为重要帧的视频帧关注模型,以减少异类行为视频序列间相似的前后关联给识别带来的干扰.最后,提出了一个能够端到端训练的网络模型:基于循环区域关注和视频帧关注的视频行为识别网络(Recurrent Region Attention and Video Frame Attention based video action recognition Network,RFANet).在两个视频行为识别基准UCF101数据集和HMDB51数据集上的实验表明,本文提出的端到端网络RFANet能够可靠地识别出视频中行为的所属类别.受双流结构启发,本文构建了双模态RFANet网络.在相同的训练环境下,双模态RFANet网络在两个数据集上达到了最优的性能.  相似文献   

8.
裴晓敏  范慧杰  唐延东 《红外与激光工程》2018,47(2):203007-0203007(6)
基于自然场景图像的人体行为识别方法中遮挡、背景干扰、光照不均匀等因素影响识别结果,利用人体三维骨架序列的行为识别方法可以克服上述缺点。首先,考虑人体行为的时空特性,提出一种时空特征融合深度学习网络人体骨架行为识别方法;其次,根据骨架几何特征建立视角不变性特征表示,CNN(Convolutional Neural Network)网络学习骨架的局部空域特征,作用于空域的LSTM(Long Short Term Memory)网络学习骨架空域节点之间的相关性特征,作用于时域的LSTM网络学习骨架序列时空关联性特征;最后,利用NTU RGB+D数据库验证文中算法。实验结果表明:算法识别精度有所提高,对于多视角骨架具有较强的鲁棒性。  相似文献   

9.
Taking fully into consideration the fact that one human action can be intuitively considered as a sequence of key poses and atomic motions in a particular order, a human action recognition method using multi-layer codebooks of key poses and atomic motions is proposed in this paper. Inspired by the dynamics models of human joints, normalized relative orientations are computed as features for each limb of human body. In order to extract key poses and atomic motions precisely, feature sequences are segmented into pose feature segments and motion feature segments dynamically, based on the potential differences of feature sequences. Multi-layer codebooks of each human action are constructed with the key poses extracted from pose feature segments and the atomic motions extracted from motion feature segments associated with each two key poses. The multi-layer codebooks represent action patterns of each human action, which can be used to recognize human actions with the proposed pattern-matching method. Three classification methods are employed for action recognition based on the multi-layer codebooks. Two public action datasets, i.e., CAD-60 and MSRC-12 datasets, are used to demonstrate the advantages of the proposed method. The experimental results show that the proposed method can obtain a comparable or better performance compared with the state-of-the-art methods.  相似文献   

10.
邹武合  张雷  戴宁 《半导体光电》2015,36(6):999-1005
提出基于关键帧技术的单次学习的动作识别方法.首先把样本动作建模为若干关键姿势的集合,然后利用关键帧把测试视频分割为单个动作,最后结合关键姿势相似度的时间加权计算动作匹配概率,进行动作识别.该动作识别方法只通过单次学习,提高了效率.使用标准深度数据库进行实验,实验结果显示了该方法的有效性,尤其在区分差别比较细微的动作方面.  相似文献   

11.
Recently, video action recognition about two-stream network is still a popular research topic in computer vision. However, most of current two-stream-based methods have two redundancy issues, including: inter-frame redundancy and intra-frame redundancy. To solve the above problems, a Spatial-Temporal Saliency Action Mask Attention network (STSAMANet) is built for action recognition. First, this paper introduces a key-frame mechanism to eliminate inter-frame redundancy. This mechanism can compute key frames on each video sequence to get the greatest difference between frames. Then, Mask R-CNN detection technology is introduced to build a saliency attention layer to eliminate intra-frame redundancy. This layer is to focus on the saliency human body and objects for each action class. We experiment on two public video action datasets, i.e., the UCF101 dataset and Penn Action dataset to verify the effectiveness of our method in action recognition.  相似文献   

12.
Manifold learning is an efficient approach for recognizing human actions. Most of the previous embedding methods are learned based on the distances between frames as data points. Thus they may be efficient in the frame recognition framework, but they will not guarantee to give optimum results when sequences are to be classified as in the case of action recognition in which temporal constraints convey important information. In the sequence recognition framework, sequences are compared based on the distances defined between sets of points. Among them Spatio-temporal Correlation Distance (SCD) is an efficient measure for comparing ordered sequences. In this paper we propose a novel embedding which is optimum in the sequence recognition framework based on SCD as the distance measure. Specifically, the proposed embedding minimizes the sum of the distances between intra-class sequences while seeking to maximize the sum of distances between inter-class points. Action sequences are represented by key poses chosen equidistantly from one action period. The action period is computed by a modified correlation-based method. Action recognition is achieved by comparing the projected sequences in the low-dimensional subspace using SCD or Hausdorff distance in a nearest neighbor framework. Several experiments are carried out on three popular datasets. The method is shown not only to classify the actions efficiently obtaining results comparable to the state of the art on all datasets, but also to be robust to additive noise and tolerant to occlusion, deformation and change in view point. Moreover, the method outperforms other classical dimension reduction techniques and performs faster by choosing less number of postures.  相似文献   

13.
In the action recognition, a proper frame sampling method can not only reduce redundant video information, but also improve the accuracy of action recognition. In this paper, an action density based non-isometric frame sampling method, namely NFS, is proposed to discard the redundant video information and sample the rational frames in videos for neural networks to achieve great accuracy on human action recognition, in which action density is introduced in our method to indicate the intensity of actions in videos. Particularly, the action density determination mechanism, focused-clips division mechanism, and reinforcement learning based frame sampling (RLFS) mechanism are proposed in NFS method. Via the evaluations with various neural networks and datasets, our results show that the proposed NFS method can achieve great effectiveness in frame sampling and can assist in achieving better accuracy on action recognition in comparison with existing methods.  相似文献   

14.
在动作识别任务中,如何充分学习和利用视频的空间特征和时序特征的相关性,对最终识别结果尤为重要。针对传统动作识别方法忽略时空特征相关性及细小特征,导致识别精度下降的问题,本文提出了一种基于卷积门控循环单元(convolutional GRU, ConvGRU)和注意力特征融合(attentional feature fusion,AFF) 的人体动作识别方法。首先,使用Xception网络获取视频帧的空间特征提取网络,并引入时空激励(spatial-temporal excitation,STE) 模块和通道激励(channel excitation,CE) 模块,获取空间特征的同时加强时序动作的建模能力。此外,将传统的长短时记忆网络(long short term memory, LSTM)网络替换为ConvGRU网络,在提取时序特征的同时,利用卷积进一步挖掘视频帧的空间特征。最后,对输出分类器进行改进,引入基于改进的多尺度通道注意力的特征融合(MCAM-AFF)模块,加强对细小特征的识别能力,提升模型的准确率。实验结果表明:在UCF101数据集和HMDB51数据集上分别达到了95.66%和69.82%的识别准确率。该算法获取了更加完整的时空特征,与当前主流模型相比更具优越性。  相似文献   

15.
16.
In video-based action recognition, using videos with different frame numbers to train a two-stream network can result in data skew problems. Moreover, extracting the key frames from a video is crucial for improving the training and recognition efficiency of action recognition systems. However, previous works suffer from problems of information loss and optical-flow interference when handling videos with different frame numbers. In this paper, an augmented two-stream network (ATSNet) is proposed to achieve robust action recognition. A frame-number-unified strategy is first incorporated into the temporal stream network to unify the frame numbers of videos. Subsequently, the grayscale statistics of the optical-flow images are extracted to filter out any invalid optical-flow images and produce the dynamic fusion weights for the two branch networks to adapt to different action videos. Experiments conducted on the UCF101 dataset demonstrate that ATSNet outperforms previously defined methods, improving the recognition accuracy by 1.13%.  相似文献   

17.
Human action recognition typically requires a large amount of training samples, which is often expensive and time-consuming to create. In this paper, we present a novel approach for enhancing human actions with a limited number of samples via structural average curves analysis. Our approach first learns average sequences from each pair of video samples for every action class and then gather them with original video samples together to form a new training set. Action modeling and recognition are proposed to be performed with the resulting new set. Our technique was evaluated on four benchmarking datasets. Our classification results are superior to those obtained with the original training sets, which suggests that the proposed method can potentially be integrated with other approaches to further improve their recognition performances.  相似文献   

18.
With the fast evolution of digital video, research and development of new technologies are greatly needed to lower the cost of video archiving, cataloging and indexing, as well as improve the efficiency and accessibility of stored video sequences. A number of methods to respectively meet these requirements have been researched and proposed. As one of the most important research topics, video abstraction helps to enable us to quickly browse a large video database and to achieve efficient content access and representation. In this paper, a video abstraction algorithm based on the visual attention model and online clustering is proposed. First, shot boundaries are detected and key frames in each shot are extracted so that consecutive key frames in a shot have the same distance. Second, the spatial saliency map indicating the saliency value of each region of the image is generated from each key frame and regions of interest (ROI) is extracted according to the saliency map. Third, key frames, as well as their corresponding saliency map, are passed to a specific filter, and several thresholds are used so that the key frames containing less information are discarded. Finally, key frames are clustered using an online clustering method based on the features in ROIs. Experimental results demonstrate the performance and effectiveness of the proposed video abstraction algorithm.  相似文献   

19.
Human action analysis has been an active research area in computer vision, and has many useful applications such as human computer interaction. Most of the state-of-the-art approaches of human action analysis are data-driven and focus on general action recognition. In this paper, we aim to analyze fitness actions with skeleton sequences and propose an efficient and robust fitness action analysis framework. Firstly, fitness actions from 15 subjects are captured and built to a fitness action dataset (Fitness-28). Secondly, skeleton information is extracted and made alignment with a simplified human skeleton model. Thirdly, the aligned skeleton information is transformed to an uniform human center coordinate system with the proposed spatial–temporal skeleton encoding method. Finally, the action classifier and local–global geometrical registration strategy are constructed to analyze the fitness actions. Experimental results demonstrate that our method can effectively assess fitness action, and have a good performance on artificial intelligence fitness system.  相似文献   

20.
We proposed a region based method to recognize human actions from video sequences. Unlike other region based methods, it works with the surrounding regions of the human silhouette termed as negative space. This paper further extends the idea of negative space to cope with the changes in viewpoints. It also addresses the problem of long shadows which is one of the major challenges of human action recognition. Some systems attempt suppressing shadows during the segmentation process but our system takes input of segmented binary images of which the shadow is not suppressed. This makes our system less dependent on segmentation process. Further, this approach can complement the positive space (silhouette) based methods to boost recognition. The system consists of a hierarchical processing: histogram analysis on segmented input image, followed by motion and shape feature extraction, pose sequence analysis by employing Dynamic Time Warping and at last classification by Nearest Neighbor classifier. We evaluated our system by most commonly used datasets and achieved higher accuracy than the state of the arts methods. Our system can also retrieve video sequences from queries of human action sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号