首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 128 毫秒
1.
为提高构音障碍识别准确率,提出一种基于多特征组合的构音障碍语音识别方法。利用遗传算法进行特征选择,从语音的韵律特征、频谱特征、人耳听觉特征、嗓音质量特征和声道模型特征等5类特征组合成的多特征组合中选择出分类准确率最高的特征子集,通过SVM分类器对选择出的特征进行识别。在Torgo声学和发音数据库对不同的语音刺激类型进行模拟实验,实验结果表明,提出方法对Torgo数据库的3种语音刺激类型的平均准确率为97.52%,优于现有的识别方法。  相似文献   

2.
为了解决传统径向基(Radial basis function,RBF)神经网络在语音识别任务中基函数中心值和半径随机初始化的问题,从人脑对语音感知的分层处理机理出发,提出利用大量无标签数据初始化网络参数的无监督预训练方式代替传统随机初始化方法,使用深度自编码网络作为语音识别的声学模型,分析梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone听觉滤波器频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小词汇量孤立词的抗噪性能。实验结果表明,深度自编码网络在MFCC特征下较径向基神经网络表现出更优越的抗噪性能;而与经典的MFCC特征相比,GFCC特征在深度自编码网络下平均识别率相对提升1.87%。  相似文献   

3.
为了提高语音识别系统的鲁棒性,提出一种基于GBFB(spectro-temporal Gabor filter bank)的声学特征提取方法,并通过分块PCA算法对高维的GBFB特征进行降维处理,最后在多个相同噪音环境对GBFB特征以及常用的GFCC,MFCC,LPCC等特征进行抗噪性能对比,与GFCC相比GBFB特征的识别率提高了5.35%,与MFCC特征相比提升了7.05%,比LPCC特征识别的基线低9个分贝。实验结果表明,在噪音环境下与传统的GFCC、MFCC以及LPCC等特征相比GBFB特征有更优越的鲁棒性。  相似文献   

4.
为了提高低信噪比下说话人识别系统的性能,提出一种Gammatone滤波器组与改进谱减法的语音增强相结合的说话人识别算法。将改进的谱减法作为预处理器,进一步提高语音信号的信噪比,再通过Gammatone滤波器组,对增强后的说话人语音信号进行处理,提取说话人语音信号的特征参数GFCC,进而将特征参数GFCC用于说话人识别算法中。仿真实验在高斯混合模型识别系统中进行。实验结果表明,采用这种算法应用于说话人识别系统,系统的识别率及鲁棒性都有明显的提高。  相似文献   

5.
咽擦音是腭裂语音中一种常见的代偿性构音异常,咽擦音的自动检测对腭咽功能的评估具有重要的临床意义。对腭裂语音咽擦音的自动检测算法进行了研究,提出分段指数压缩Gamatone滤波器组(Piecewise Exponent Compression Gammatone Filters,PECGTFs)和基于Softsign的多通道(Softsign-based Multi-Channel,SSMC)模型相结合提取语音信号的谱特征参数,采用KNN分类器,实现腭裂语音咽擦音的自动检测。实验共测试306个语音样本,并对比了使用不同的Gammatone滤波器、使用高斯差分(Difference of Gaussian,DoG)增强和SSMC模型增强对咽擦音自动检测结果的影响。实验结果表明,使用PECGTFs与SSMC相结合的算法对腭裂语音咽擦音的自动检测正确率达94.95%,对临床诊断具有一定的参考价值。  相似文献   

6.
入侵检测是网络安全领域中具有挑战性的重要任务。单个分类器可能会带来分类偏差,使用集成学习相较单分类器,具有更强的泛化能力及更高的精确率,但调整各基分类器的权重需要大量的时间。基于此问题,提出了一种基于Bagging特征降维和基于Bagging异质集成入侵检测分类算法(Double-Bagging)的特征降维异质集成入侵检测算法。该算法通过集成5个特征选择算法,采用Bagging投票机制选出最优特征子集,实现高效准确的特征降维。同时,引入集成学习中的成对多样性度量,从不同基分类器组合中选出最优异质集成集合。对于赋权函数综合使用精确率和AOC值作为权重对分类器进行集成。实验结果表明,所提算法精确率高达99.94%,系统错误率及正判率分别为0.03%和99.55%,均优于现有主流入侵检测算法的。  相似文献   

7.
针对语音情感识别问题,提出一种采用决策模板的多分类器融合方法,利用不同类型的声学特征子集来构造子分类器。不同的子集能充分提高各子分类器之间的“多样性”指标,这是多分类器融合算法能够成功应用的必备条件。与多数投票融合算法和支持向量机相比该方法取得了较好的识别结果。另一方面,从多样性指标分析的角度出发探究该方法能获得较好识别效果的原因。  相似文献   

8.
对鸣笛声的准确识别是机动车鸣笛抓拍系统得以运用的关键。为了克服单一特征对鸣笛声表征不足的缺陷,提高识别的准确性,文章将Mel频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)与Gama频率倒谱系数(Gammatone Frequency Cepstrum Coefficient,GFCC)融合得到M-GFCC特征,并分别使用支持向量机(Support Vector Machines,SVM)和BP(Back Propagation,BP)神经网络算法进行分类。实验结果表明,与使用单一的MFCC特征相比,BP神经网络对鸣笛声识别的有效率提高了10.4%,SVM的有效率提高了4.4%;相较于单一的GFCC特征,BP神经网络的有效率提高了6.6%,SVM的有效率提高了4.2%,证明了该融合特征能提高鸣笛声识别准确性。  相似文献   

9.
为实现对腭裂高鼻音等级的自动识别,通过对语音信号小波处理和特征提取方法的综合研究,提出基于小波分解系数倒谱特征的腭裂高鼻音等级自动识别算法。目前,研究人员对腭裂语音的研究多基于MFCC、Teager能量、香农能量等特征,识别正确率偏低,且计算量过大。文中对4种等级腭裂高鼻音的1789个元音\a\语音数据提取小波分解系数倒谱特征参数,使用KNN分类器对4种不同等级的高鼻音进行自动识别,将识别结果与MFCC、LPCC、基音周期、共振峰和短时能量共5种经典声学特征的识别结果作比较,同时使用SVM分类器对不同等级的腭裂高鼻音进行自动识别,并与KNN分类器进行对比。实验结果表明,基于小波分解系数倒谱特征的识别结果优于经典声学特征,且KNN分类器的识别结果优于SVM分类器。小波分解系数倒谱特征在KNN中的识别率最高达到91.67%,在SVM中达到87.60%,经典声学特征在KNN分类器中的识别率为21.69%~84.54%,在SVM中的识别率为30.61%~78.24%。  相似文献   

10.
情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法.  相似文献   

11.
语音和非语音类声音的识别在很多系统的研发中都有非常重要的作用,如安全监控、医疗保健、现代化的视听会议系统等。虽然绝大多数声音信号都有其独特的发音机制,然而要从其中进行特征的提取往往缺乏系统有效的方法。基于不同的音频信号都有其固有的特点,使用类所属特征选择方法来提取音频中的特征,从而进行分类,并用所提出的方法对语音和两种非语音类声音(咳嗽和杯碟破碎的声音)进行了实验仿真,实验结果表明,与常规的特征选择方法相比,提出的方法用更少的特征实现了更好的分类。  相似文献   

12.
一种基于语音学知识的汉语辅音分类方法   总被引:3,自引:0,他引:3  
文章提出一种提高汉语辅音识别性能的框架,在此框架下构造了一个基于声学—语音层分析的多级分类器,实现对全部汉语辅音的无重叠分类,测试了将辅音分类结果与概率统计模型结合的效果。重点讨论了用于汉语辅音分类的几种特征参数提取技术和实验结果。文章所提取的特征参数包括非嗓音段持续时间(DUP)、归一化的有效频带能量趋势等,涉及时域、频域和小波变换域等不同分析处理方法,特征参数简单、有效,具有较好的与后接元音无关和非特定人性质。分类器将21个汉语辅音分为5类,狖m,n,l,r狚,狖b,d,g狚,狖p,t,k,f,h狚,狖zh,ch,sh狚,狖z,c,s,j,q,x狚;其分类正确率分别达97.21%、97.10%,97.70%,93.31%和94.80%。实验所用的语音资料库包括21个话者的孤立字汉语辅音发音资料。  相似文献   

13.
针对目前语音谎言检测识别效果差、特征提取不充分等问题,提出了一种基于注意力机制的欺骗语音识别网络。首先,将双向长短时记忆与帧级声学特征相结合,其中帧级声学特征的维数随语音长度的变化而变化,从而有效提取声学特征。其次,采用基于时间注意增强卷积双向长短时记忆模型作为分类算法,使分类器能够从输入中学习与任务相关的深层信息,提高识别性能。最后,采用跳跃连接机制将时间注意增强卷积双向长短时记忆模型的底层输出直接连接到全连接层,从而充分利用了学习到的特征,避免了消失梯度的问题。实验阶段,与LSTM以及其他基准模型进行对比,所提模型性能最优。仿真结果进一步验证了所提模型对语音谎言检测领域发展及提升识别率提供了一定借鉴作用。  相似文献   

14.
Feature subsets and hidden Markov model (HMM) parameters are the two major factors that affect the classification accuracy (CA) of the HMM-based classifier. This paper proposes a genetic algorithm based approach for simultaneously optimizing both feature subsets and HMM parameters with the aim to obtain the best HMM-based classifier. Experimental data extracted from three spontaneous speech corpora were used to evaluate the effectiveness of the proposed approach and the three other approaches (i.e. the approaches to single optimization of feature subsets, single optimization of HMM parameters, and no optimization of both feature subsets and HMM parameters) that were adopted in the previous work for discrimination between speech and non-speech events (e.g. filled pause, laughter, applause). The experimental results show that the proposed approach obtains CA of 91.05%, while the three other approaches obtain CA of 86.11%, 87.05%, and 83.16%, respectively. The results suggest that the proposed approach is superior to the previous approaches.  相似文献   

15.
Feature selection of very high-resolution (VHR) images is a key prerequisite for supervised classification. However, it is always difficult to acquire the features which have the highest correlation to the type of land cover for improving classification accuracy. To address this problem, this paper proposed a methodology of feature selection using the results of multiple segmentation via genetic algorithm (GA) and correlation feature selection (CFS) integrating sparse auto-encoder (SAE). Firstly, 61 features, including spectral features and spatial features, are extracted from the results of multi-scale segmentation over a WorldView-2 image in Xicheng District, Beijing. Then, 40-dimensional features and 30-dimensional features are derived from the selection with GA+CFS and the optimization with SAE, respectively. Thirdly, the final classification is achieved by logistic regression (LR) based on different subsets of features extracted from the WorldView-2 image. It is found that the result of feature selection could contribute to increase in the intra-species separation and reduction in the inner-species variability. Adding extra lower-ranked features appeared to reduce the accuracy of classification. The results indicate that the overall classification accuracy with 30-dimensional features reached 87.56%, and increased 5.61% compared to the results with 61-dimensional features. For the two kinds of optimized features, the Z-test values are all greater than 1.96, which implied that feature dimensionality reduction and feature space optimization could significantly improve the accuracy of image land cover classification. The texture features in the wavelet domain are the most important features for the study area in the WorldView-2 image classification. Adding wavelet and the grey-level co-occurrence matrix (GLCM) information, especially for GLCM features in wavelet, appeared not to improve classification accuracy. The SAE-based method can produce feature subsets for improving mapping accuracy more efficiently.  相似文献   

16.
Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often physically incapacitated. Mel-Frequency Cepstral Coefficients (MFCCs) have been proven to be an appropriate representation of dysarthric speech, but the question of which MFCC-based feature set represents dysarthric acoustic features most effectively has not been answered. Moreover, most of the current dysarthric speech recognisers are either speaker-dependent (SD) or speaker-adaptive (SA), and they perform poorly in terms of generalisability as a speaker-independent (SI) model. First, by comparing the results of 28 dysarthric SD speech recognisers, this study identifies the best-performing set of MFCC parameters, which can represent dysarthric acoustic features to be used in Artificial Neural Network (ANN)-based ASR. Next, this paper studies the application of ANNs as a fixed-length isolated-word SI ASR for individuals who suffer from dysarthria. The results show that the speech recognisers trained by the conventional 12 coefficients MFCC features without the use of delta and acceleration features provided the best accuracy, and the proposed SI ASR recognised the speech of the unforeseen dysarthric evaluation subjects with word recognition rate of 68.38%.  相似文献   

17.
In this paper we propose a feature normalization method for speaker-independent speech emotion recognition. The performance of a speech emotion classifier largely depends on the training data, and a large number of unknown speakers may cause a great challenge. To address this problem, first, we extract and analyse 481 basic acoustic features. Second, we use principal component analysis and linear discriminant analysis jointly to construct the speaker-sensitive feature space. Third, we classify the emotional utterances into pseudo-speaker groups in the speaker-sensitive feature space by using fuzzy k-means clustering. Finally, we normalize the original basic acoustic features of each utterance based on its group information. To verify our normalization algorithm, we adopt a Gaussian mixture model based classifier for recognition test. The experimental results show that our normalization algorithm is effective on our locally collected database, as well as on the eNTERFACE’05 Audio-Visual Emotion Database. The emotional features achieved using our method are robust to the speaker change, and an improved recognition rate is observed.  相似文献   

18.

Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号