首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 663 毫秒
1.
罗元  李丹  张毅 《半导体光电》2020,41(3):414-419
手语识别广泛应用于聋哑人与正常人之间的交流中。针对手语识别任务中时空特征提取不充分而导致识别率低的问题,提出了一种新颖的基于时空注意力的手语识别模型。首先提出了基于残差3D卷积网络(Residual 3D Convolutional Neural Network,Res3DCNN)的空间注意力模块,用来自动关注空间中的显著区域;随后提出了基于卷积长短时记忆网络(Convolutional Long Short-Term Memory,ConvLSTM)的时间注意力模块,用来衡量视频帧的重要性。所提算法的关键在于在空间中关注显著区域,并且在时间上自动选择关键帧。最后,在CSL手语数据集上验证了算法的有效性。  相似文献   

2.
针对传统的分类方法由于提取的特征比较单一或者分类器结构过于简单,导致手语识别率较低的问题,本文将深度卷积神经网络架构作为分类器与多特征融合算法进行结合,通过使用纹理特征结合形状特征做到有效识别。首先纹理特征通过LBP、卷积神经网络和灰度共生矩阵方法得到,其中形状特征向量由Hu氏不变量和傅里叶级数组成。为了避免过拟合现象,使用"dropout"方法训练深度卷积神经网络。这种基于深度卷积神经网络的多特征融合的手语识别方法,在"hand"数据库中,对32种势的识别率为97.73%。相比一般的手语识别方法,此方法鲁棒性更强,并且识别率更高。  相似文献   

3.
This study offers an enhanced yolov4-tiny traffic sign identification method for easy deployment on mobile or embedded devices to address the difficulties of a high number of parameters, low recognition accuracy, and poor real-time performance of traffic sign recognition models in complex scenarios. The yolov4-tiny network serves as the model’s foundation. To begin, Octave Convolution is incorporated into the backbone network to eliminate low-frequency feature redundancy, lowering the number of parameters in the model and enhancing computational efficiency. Second, the convolutional block attention module is employed to improve the recognition accuracy of small and medium-sized targets by strengthening the weights of traffic sign regions and suppressing the weights of invalid features. Finally, in the feature fusion stage, the Feature Pyramid Networks structure is replaced with the Simplified Path Aggregation Network structure to improve the fusing of shallow feature information with deep semantic knowledge and lower the miss detection rate even more On the TT100K data set as well as on CCTSDB dataset, the experimental results suggest that our technique can achieve good recognition performance. With a 16MB model size, our solution improves the mean average precision by 3.5 percent and the Frame Per Second by 12.5 f/s when compared to the yolov4-tiny algorithm. Our method outperforms yolov4-tiny in terms of recognition accuracy and detection speed, and it can easily meet the real-time requirements for traffic sign recognition.  相似文献   

4.
针对为提高交通标志识别精度使得神经网络层数过深从而导致实时性不佳的问题,提出了一种轻量型YOLOv5交通标志识别方法。首先采用遗传学习算法和K-means聚类确定适合交通标志识别的锚框,然后引入Stem模块和ShufflenetV2的基础单元网络来替换YOLOv5的主干网络。相比于YOLOv5模型,在中国交通标志检测数据集上,轻量型YOLOv5模型在保持识别精度为95.9%的同时,参数量减少了95.4%,实际内存空间减少了93.9%,在GPU和CPU上运行的速度分别提升了79.7%和75%,极大地提高了交通标志识别的实时性,更适合无人驾驶环境感知系统的部署。  相似文献   

5.
Conventional face image generation using generative adversarial networks (GAN) is limited by the quality of generated images since generator and discriminator use the same backpropagation network. In this paper, we discuss algorithms that can improve the quality of generated images, that is, high-quality face image generation. In order to achieve stability of network, we replace MLP with convolutional neural network (CNN) and remove pooling layers. We conduct comprehensive experiments on LFW, CelebA datasets and experimental results show the effectiveness of our proposed method.  相似文献   

6.
针对现有的高分辨率遥感图像居民地信息提取精度和效率不够高的问题,提出了一种基于改进全卷积网络的“高分一号”(GF-1)遥感影像居民地提取方法。首先,通过专业的目视解译制备大量居民地训练样本;然后,将预训练过的深度卷积神经网络进行全卷积网络的改造,并以具有多尺度卷积核的Inception模块代替由全连接层改造的卷积层,达到减小网络模型参数量、增加特征表达能力的目的;最后,用制作好的高分辨率遥感图像居民地数据集进行训练和验证,生成可直接进行居民地信息提取的全卷积网络。实验结果表明,基于改进全卷积网络的方法可以实现精确有效的居民地信息提取,Kappa系数超过94%。  相似文献   

7.
针对微多普勒特征识别人体动作的局限性,基于调频连续波( Frequency Modulated Continuous Wave,FMCW)雷达采用深度学习方法对人体动作识别,提出了一种特征融合卷积神经网络结构.利用FMCW雷达采样的人体动作回波数据分别构建出时间-距离特征和微多普勒特征图,将这两种特征图作为输入数据分别...  相似文献   

8.
Currently, video-based Sign language recognition (SLR) has been extensively studied using deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). In addition, using multi view attention mechanism along with CNNs could be an appealing solution that can be considered in order to make the machine interpretation process immune to finger self-occlusions. The proposed multi stream CNN mixes spatial and motion modelled video sequences to create a low dimensional feature vector at multiple stages in the CNN pipeline. Hence, we solve the view invariance problem into a video classification problem using attention model CNNs. For superior network performance during training, the signs are learned through a motion attention network thus focusing on the parts that play a major role in generating a view based paired pooling using a trainable view pair pooling network (VPPN). The VPPN, pairs views to produce a maximally distributed discriminating features from all the views for an improved sign recognition. The results showed an increase in recognition accuracies on 2D video sign language datasets. Similar results were obtained on benchmark action datasets such as NTU RGB D, MuHAVi, WEIZMANN and NUMA as there is no multi view sign language dataset except ours.  相似文献   

9.
针对表面肌电信号(surface electromyography, sEMG)手势识别使用卷积神经网络(convolutional neural network, CNN)提取特征不够充分,且忽略时序信息而导致识别精度不高的问题,本文创新性地提出了一种融合双层注意力与多流卷积神经网络(multi-stream convolutional neural network, MS-CNN)的sEMG手势识别记忆网络模型。首先,利用滑动窗口生成的表面肌电图像作为该模型的输入;然后在MS-CNN中嵌入通道注意力层(channel attention module, CAM),弱化无关信息,使网络能够更加专注sEMG的有效特征;其次,通过长短期记忆网络(long short term memory network, LSTM)对输入的特征进行时序上的激励,关注更多sEMG的时序信息,让网络在时间维度上拥有更强的学习能力;最后,采用时序注意力(time-sequence attention, TSA)层对LSTM的状态进行关注,从而更好地学习重要肌肉信息,提高手势识别精度。在NinaPro数据集上...  相似文献   

10.
Driver distraction has currently been a global issue causing the dramatic increase of road accidents and casualties. However, recognizing distracted driving action remains a challenging task in the field of computer vision, since inter-class variations between different driver action categories are quite subtle. To overcome this difficulty, in this paper, a novel deep learning based approach is proposed to extract fine-grained feature representation for image-based driver action recognition. Specifically, we improve the existing convolutional neural network from two aspects: (1) we employ multi-scale convolutional block with different receptive fields of kernel sizes to generate hierarchical feature map and adopt maximum selection unit to adaptively combine multi-scale information; (2) we incorporate an attention mechanism to learn pixel saliency and channel saliency between convolutional features so that it can guide the network to intensify local detail information and suppress global background information. For experiment, we evaluate the designed architecture on multiple driver action datasets. The quantitative experiment result shows that the proposed multi-scale attention convolutional neural network (MSA-CNN) obtains the state of the art performance in image-based driver action recognition.  相似文献   

11.
脑电信号一直被誉为疲劳检测的“金标准”,驾驶者的精神状态可通过对脑电信号的分析得到。但由于脑电信号具有非线性、非平稳性和空间分辨率低等特点,传统的机器学习方法在运用脑电信号进行疲劳检测时还存在识别率低,特征提取操作繁琐等不足。为此,该文基于脑电信号的电极-频率分布图,提出运用深度迁移学习实现的驾驶疲劳检测方法,即搭建深度卷积神经网络,并利用SEED脑电情绪数据集对其进行预训练,然后通过迁移学习方法将其用于驾驶疲劳检测。实验结果表明,卷积神经网络模型能够很好地从电极-频率分布图中获得与疲劳状态相关的特征信息,达到较好的识别效果。此外,基于迁移学习策略可以将训练好的深度网络模型迁移到其他识别任务上,有助于推动脑电信号在驾驶疲劳检测系统中的应用。  相似文献   

12.
Due to the copyright issues often involved in the recapture of LCD screen content, recaptured screen image identification has received lots of concerns in image source forensics. This paper analyzes the characteristics of convolutional neural network (CNN) and vision transformer (ViT) in extracting features and proposes a cascaded network structure that combines local-feature and global-feature extraction modules to detect the recaptured screen image from original images with or without demoiréing operation. We first extract the local features of the input images with five convolutional layers and feed the local features into the ViT to enhance the local perception capability of the ViT module, and further extract the global features of the input images. Through thorough experiments, our method achieves a detection accuracy rate of 0.9691 in our generated dataset and 0.9940 in the existing mixture dataset, both showing the best performance among the compared methods.  相似文献   

13.
Video anomaly detection (VAD) refers to identifying abnormal events in the surveillance video. Typically, reconstruction based video anomaly detection techniques employ convolutional autoencoders with a limited number of layers, which extracts insufficient features leading to improper network training. To address this challenge, an end-to-end unsupervised feature enhancement network, namely Bi-Residual Convolutional AutoEncoder (Bi-ResCAE) has been proposed that can learn normal events with low reconstruction error and detect anomalies with high reconstruction error. The proposed Bi-ResCAE network incorporates long–short residual connections to enhance feature reusability and training stabilization. In addition, we propose to formulate a novel VAD model that can extract appearance and motion features by fusing both the Bi-ResCAE network and optical flow network in the objective function to recognize the anomalous object in the video. Extensive experiments on three benchmark datasets validate the effectiveness of the model. The proposed model achieves an AUC (Area Under the ROC Curve) of 84.7% on Ped1, 97.7% on Ped2, and 86.71% on the Avenue dataset. The results show that the Bi-READ performs better than state-of-the-art techniques.  相似文献   

14.
15.
近年来,卷积神经网络(CNN)已广泛应用于合成孔径雷达(SAR)目标识别。由于SAR目标的训练数据集通常较小,基于CNN的SAR图像目标识别容易产生过拟合问题。生成对抗网络(GAN)是一种无监督训练网络,通过生成器和鉴别器两者之间的博弈,使生成的图像难以被鉴别器鉴别出真假。本文提出一种基于改进的卷积神经网络(ICNN)和改进的生成对抗网络(IGAN)的SAR目标识别方法,即先用训练样本对IGAN进行无监督预训练,再用训练好的IGAN鉴别器参数初始化ICNN,然后用训练样本对ICNN微调,最后用训练好的ICNN对测试样本进行分类。MSTAR实验结果表明,提出的方法不仅能够在训练样本数降至原样本数30%的情况下获得高达96.37%的识别率,而且该方法比直接采用ICNN的方法具有更强的抗噪声能力。  相似文献   

16.
Unconstrained face verification aims to verify whether two specify images contain the same person. In this paper, we propose a deep Bayesian convolutional neural network (DBCNN) framework to extract facial features and measure their similarity for face verification in unconstrained conditions. Specifically, we design a deep convolutional neural network and construct a Bayesian probabilistic model by transferring the Bayesian likelihood ratio function into linear decision function. By training a decision line rather than finding a suitable threshold, we further enlarge the distances between inter-class and intra-class in unconstrained environment. Finally, we comprehensively evaluate our method on LFW, CACD-VS and MegaFace datasets. The test results on LFW and CACD-VS datasets show that our method can shrink intra-class variations significantly. The performance of our DBCNN model on MegaFace dataset proves that our model can achieve comparable performance to state-of-the-art methods on face verification with relative small training data and only one single network.  相似文献   

17.
吴鹏  林国强  郭玉荣  赵振兵 《信号处理》2019,35(10):1747-1752
通道剪枝是深度模型压缩的主要方法之一。针对密集连接卷积神经网络中,每一层都接收其前部所有卷积层的输出特征图作为输入,但并非每个后部层都需要所有先前层的特征,网络中存在很大冗余的缺点。本文提出一种自学习剪枝密集连接网络中冗余通道的方法,得到稀疏密集连接卷积神经网络。首先,提出了一种衡量每个卷积层中每个输入特征图对输出特征图贡献度大小的方法,贡献度小的输入特征图即为冗余特征图;其次,介绍了通过自学习,网络分阶段剪枝冗余通道的训练过程,得到了稀疏密集连接卷积神经网络,该网络剪枝了密集连接网络中的冗余通道,减少了网络参数,降低了存储和计算量;最后,为了验证本文方法的有效性,在图像分类数据集CIFAR-10/100上进行了实验,在不牺牲准确率的前提下减小了模型冗余。   相似文献   

18.
To solve the problem of low sign language recognition rate under the condition of small samples, a simple and effective static gesture recognition method based on an attention mechanism is proposed. The method proposed in this paper can enhance the features of both the details and the subject of the gesture image. The input of the proposed method depends on the intermediate feature map generated by the original network. Also, the proposed convolutional model is a lightweight general module, which can be seamlessly integrated into any CNN(Convolutional Neural Network) architecture and achieve significant performance gains with minimal overhead. Experiments on two different datasets show that the proposed method is effective and can improve the accuracy of sign language recognition of the benchmark model, making its performance better than the existing methods.  相似文献   

19.
目标识别是合成孔径雷达(Synthetic Aperture Radar,SAR)图像解译的重要步骤。鉴于卷积神经网络(Convolutional Neural Network, CNN)在自然图像分类领域表现优越,基于CNN的SAR图像目标识别方法成为了当前的研究热点。SAR图像目标的散射特征往往存在于多个尺度当中,且存在固有的噪声斑,含有冗余信息,因此,SAR图像目标智能识别成为了一项挑战。针对以上问题,本文提出一种多尺度注意力卷积神经网络,结合多尺度特征提取和注意力机制,设计了基于注意力的多尺度残差特征提取模块,实现了高精度的SAR遥感图像目标识别。该方法在MSTAR数据集10类目标识别任务中的总体准确率达到了99.84%,明显优于其他算法。在测试集加入4种型号变体后,10类目标识别任务中的总体准确率达到了99.28%,验证了该方法在复杂情况下的有效性。  相似文献   

20.
齐悦  董云云  王溢琴 《红外与激光工程》2022,51(12):20220176-1-20220176-8
针对大规模姿态变化和大角度人脸平面旋转(Rotation-in-Plane, RIP)等复杂条件下,多尺度旋转人脸检测精度低的问题,提出了一种基于汇聚级联卷积神经网络(Convolutional Neural Networks, CNN)的旋转人脸检测方法。采用由粗到精的级联策略,在主网络SSD的多个特征层上汇聚级联了多个浅层的卷积神经网络,逐步完成人脸/非人脸检测、人脸边界框位置更新和人脸RIP角度估计。该方法在Rotate FDDB和Rotate Sub-WIDER FACE数据集上取得了较好的检测效果。在Rotate Sub-WIDER FACE数据集出现100次误报时的检测精度为87.1%,速度为45 FPS,证明该方法可在低时间损耗下完成精确的旋转人脸检测。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号