首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we focus on recognizing person-person interactions using skeletal data captured from depth sensors. First, we propose a novel and efficient view transformation scheme. The skeletal interaction sequence is re-observed under a new coordinate system, which is invariant to various setups and capturing views of depth cameras as well as the position or facing orientation exchange between two persons. Second, we propose concise and discriminative interaction representations simply composed of the joint locations from two persons. Proposed representations are efficient to describe both the holistic interactive scene and individual poses performed by each subject separately. Third, we introduce the graph convolutional networks(GCN) to directly learn proposed skeletal interaction representations. Moreover, we design a multiple GCN-based model to provide the final class score. Extensive experimental results on three skeletal action datasets NTU RGB+D 60, NTU RGB+D 120 and SBU consistently demonstrate the superiority of our interaction recognition method.  相似文献   

2.
The experience from computer vision was learned,an innovative neural network model called InnoHAR (inception neural network for human activity recognition) based on the inception neural network and recurrent neural network was put forward,which started from an end-to-end multi-channel sensor waveform data,followed by the 1×1 convolution for better combination of the multi-channel data,and the various scales of convolution to extract the waveform characteristics of different scales,the max-pooling layer to prevent the disturbance of tiny noise causing false positives,combined with the feature of GRU helped to time-sequential modeling,made full use of the characteristics of data classification task.Compared with the state-of-the-art neural network model,the InnoHAR model has a promotion of 3% in the recognition accuracy,which has reached the state-of-the-art on the dataset we used,at the same time it still can guarantee the real-time prediction of low-power embedded platform,also with more space for future exploration.  相似文献   

3.
为了解决人脸识别的安全性问题,提高对恶意攻击人脸识别系统的安全防护,使人脸识别技术能够获得更广泛应用,本文提出了在人脸识别技术上融入一种基于深度神经网络的唇语识别技术的系统。与现有的唇语识别技术不同的是,该系统主要是识别用户的唇动习惯。运用本系统,用户在进行人脸识别的同时可按照检测方的提示,读出相应的内容,并在对用户的人脸进行验证的过程中,对用户通过唇动说出的内容分别实现唇动识别、比对,从而有效地提升人脸识别的安全性水平。实验结果表明,在故意针对人脸识别系统的攻击中,融入本技术的系统有更好的识别准确率。  相似文献   

4.
Detecting hazardous activity during driving can be useful in curbing roadside accidents. Existing techniques utilizing image based features for encoding such activity can sometimes misclassify crucial scenarios. One particular work by Zhao et al. (2013 [1], 2013 [2], 2011 [3]) suggests an image based feature set that encodes the driver’s pose, which is categorized into one of four activities. We bring more clarity in understanding the activity by proposing a richer, video based feature set that adeptly exploits spatiotemporal information of the driver. Our feature set encodes the driver’s pose, crucial variations in pose and interactions with objects within the vehicle. The feature set is tested on our newly created dataset since the ones used in literature are not publicly available. Our proposed feature set captures a larger number of activities and using standard classifiers and benchmarks it has shown significant improvements over the existing ones.  相似文献   

5.
For the problems existing in most of the researches,such as weak anti-noise ability,incompatible signal size and insufficient feature extraction of deep-learning-based Wi-Fi human activity recognition,a kind of sequential image deep learning-based recognition method was proposed.Based on the idea of sequential image deep learning,a series of image frames were reconstructed from time-varied Wi-Fi signal to ensure the consistency of input size.In addition,a low-rank decomposition method was innovatively designed to separate low-rank activity information merged in noises.Finally,a deep model combining temporal stream and spatial stream was proposed to automatically capture the spatiotemporal features from length-varied image sequences.The proposed method was extensively tested in WiAR dataset and self collected dataset.The experimental results show the proposed method could achieve the accuracy of 0.94 and 0.96,which indicate its high-accuracy performance and robustness in pervasive environments.  相似文献   

6.
Human activity recognition (HAR) has become effective as a computer vision tool for video surveillance systems. In this paper, a novel biometric system that can detect human activities in 3D space is proposed. In order to implement HAR, joint angles obtained using an RGB‐depth sensor are used as features. Because HAR is operated in the time domain, angle information is stored using the sliding kernel method. Haar‐wavelet transform (HWT) is applied to preserve the information of the features before reducing the data dimension. Dimension reduction using an averaging algorithm is also applied to decrease the computational cost, which provides faster performance while maintaining high accuracy. Before the classification, a proposed thresholding method with inverse HWT is conducted to extract the final feature set. Finally, the K‐nearest neighbor (k‐NN) algorithm is used to recognize the activity with respect to the given data. The method compares favorably with the results using other machine learning algorithms.  相似文献   

7.
针对当前动作识别可信度计算方法中混淆率高、不适用于迁移学习等问题,提出一种基于样本上下文信息的可信度计算方法(S-HMM, sliding windows hidden Markov model)。该方法使用隐马尔可夫模型(HMM, hidden Markov model)理论对识别结果序列建模,将样本所在序列识别正确的概率作为识别结果的可信度,避免了当前可信度计算方法依赖于样本在特征空间中分布的问题。实验使用真实场景中的数据进行仿真,结果表明,与现有方法相比,该方法可将可信度混淆率降低37%左右。  相似文献   

8.
This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approach averages the two streams. Here we propose a correlation network with a Shannon fusion for learning a pre-trained CNN. A Long-range video may consist of spatiotemporal correlations over arbitrary times, which can be captured by forming the correlation network from simple fully connected layers. This approach was found to complement the existing network fusion methods. The importance of multimodal correlation is validated in comparison experiments on the UCF-101 and HMDB-51 datasets. The multimodal correlation enhanced the accuracy of the video recognition results.  相似文献   

9.
群组行为识别是对个体的共同行为进行识别。群组行为与群组状态密不可分,也与群组内个体时空特征息息相关,时空信息既能描述空间语义信息,更能反映行为的动态变化情况。针对有效精细的时空特征提取问题,本文提出了一种基于注意力机制和深度时空信息的群组行为识别方法。首先将ShuffleAttention引入双流特征提取网络中,有效提取个体外观和运动信息。其次使用改进Non-Local网络提取深度时序信息。最后将个体特征送到图卷积网络中进行空间交互信息建模,得到群组行为识别结果。在CAD和CAED数据集上的准确率达到了93.6%和97.8%,在CAD数据集上与凝聚群组搜索算法(CCS)和成员关系图(ARG)方法相比,准确率提高了1.2%和2.6%,这表明本文方法能有效提取深度时空特征,提升群组行为识别准确率。  相似文献   

10.
With modern e-healthcare developments, ambulatory healthcare has become a prominent requirement for physical or mental ailed, elderly, childhood people. One of the major challenges in such applications is timing and precision. A potential solution to this problem is the fog-assisted cloud computing architecture. The activity recognition task is performed with the hybrid advantages of deep learning and genetic algorithms. The video frames captured from vision cameras are subjected to the genetic change detection algorithm, which detects changes in activities of subsequent frames. Consequently, the deep learning algorithm recognizes the activity of the changed frame. This hybrid algorithm is run on top of fog-assisted cloud framework, fogbus and the performance measures including latency, execution time, arbitration time and jitter are observed. Empirical evaluations of the proposed model against three activity data sets shows that the proposed deep genetic algorithm exhibits higher accuracy in inferring human activities as compared to the state-of-the-art algorithms.  相似文献   

11.
The diversity in the phone placements of different mobile users' dailylife increases the difficulty of recognizing human activities by using mobile phone accelerometer data. To solve this problem, a compressed sensing method to recognize human activities that is based on compressed sensing theory and utilizes both raw mobile phone accelerometer data and phone placement information is proposed. First, an over-complete dictionary matrix is constructed using sufficient raw tri-axis acceleration data labeled with phone placement information. Then, the sparse coefficient is evaluated for the samples that need to be tested by resolving L1 minimization. Finally, residual values are calculated and the minimum value is selected as the indicator to obtain the recognition results. Experimental results show that this method can achieve a recognition accuracy reaching 89.86%, which is higher than that of a recognition method that does not adopt the phone placement information for the recognition process. The recognition accuracy of the proposed method is effective and satisfactory.  相似文献   

12.
Mobile phones are equipped with a rich set of sensors, such as accelerometers, magnetometers, gyroscopes, photometers, orientation sensors, and gravity sensors. These sensors can be used for human activity recognition in the ubiquitous computing domain. Most of reported studies consider acceleration signals that are collected from a known fixed device location and orientation. This paper describes how more accurate results of basic activity recognition can be achieved with transformed accelerometer data. Based on the rotation matrix (Euler Angle Conversion) derived from the orientation angles of gyroscopes and orientation sensors, we transform input signals into a reference coordinate system. The advantage of the transformation is that it allows activity classification and recognition to be carried out independent of the orientation of sensors. We consider five user activities: staying, walking, running, ascending stairs, and descending stairs, with a phone being placed in the subject's hand, or in pants pocket, or in a handbag. The results show that an overall orientation independent accuracy of 84.77% is achieved, which is a improvement of 17.26% over those classifications without input transformation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
The accurate and efficient classification of Internet traffic is the first and key step to accurate traffic management, network security and traffic analysis. The classic ways to identify flows is either inaccu-rate or inefficient, which are not suitable to be applied to real-time online classification. In this paper, we originally presented an early recognition method named Early Recognition Based on Deep Packet Inspec-tion (ERBDPI) based on deep packet inspection, after analyzing the distribution of payload signature be-tween packets of a flow in detail. The basic concept of ERBDPI is classifying flows based on the payload signature of their first some packets, so that we can identify traffic at the beginning of a flow connection. We compared the performance of ERBDPI with that of traditional sampling methods both synthetically and using real-world traffic traces. The result shows that ERBDPI can get a higher classification accuracy with a lower packet sampling rate, which makes it suitable to be applied to accurate real-time classification in high-speed links.  相似文献   

14.
15.
针对微多普勒特征识别人体动作的局限性,基于调频连续波( Frequency Modulated Continuous Wave,FMCW)雷达采用深度学习方法对人体动作识别,提出了一种特征融合卷积神经网络结构.利用FMCW雷达采样的人体动作回波数据分别构建出时间-距离特征和微多普勒特征图,将这两种特征图作为输入数据分别...  相似文献   

16.
Explicit reasoning over a spatial substrate, i.e., space–time information structures underlying a spatial problem, simplifies reasoning. Diagrammatic reasoning makes use of diagrams for exploiting such underlying structures. This paper proposes a novel approach combining diagrammatic reasoning with qualitative spatial and temporal reasoning techniques to visualize and perceive spatio-temporal relations among objects in a video. The hybrid techniques explore information over the spatial substrate for relational extractions. Different relations among objects in transition define short-term activities. Mealy machines are learned over patterns of short-term activities as activity recognizers. The proposed representation and recognition mechanism is validated by conducting experiments for video activity recognition from DARPA Mind’s Eye and J-HMDB dataset.  相似文献   

17.
18.
In this paper we introduce a novel method for action/movement recognition in motion capture data. The joints orientation angles and the forward differences of these angles in different temporal scales are used to represent a motion capture sequence. Initially K-means is applied on training data to discover the most representative patterns on orientation angles and their forward differences. A novel K-means variant that takes into account the periodic nature of angular data is applied on the former. Each frame is then assigned to one or more of these patterns and histograms that describe the frequency of occurrence of these patterns for each movement are constructed. Nearest neighbour and SVM classification are used for action recognition on the test data. The effectiveness and robustness of this method is shown through extensive experimental results on four standard databases of motion capture data and various experimental setups.  相似文献   

19.
This work includes the design and implementation of both conventional, and neural network approaches to recognition of the speakers templates which are introduced to the system via a voice master card and preprocessed before extracting the features used in the recognltion. The conclusion is that the system performance in case of neural network is better than that of the conventional one, achieving a smooth degradation when dealing with nolsy patterns and higher performance when dealing with noise-free patterns.  相似文献   

20.
Several alternate linear prediction parametric representations are experimentally compared as to their vowel recognition performance. The speech data used for this purpose consist of 900 utterances of 10 different vowels spoken by 3 speakers in a/b/ -vowel- /b/ context. The cepstral coefficients representation is found to be the best linear prediction parametric representation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号