首页 | 官方网站   微博 | 高级检索  
 共查询到19条相似文献,搜索用时 140 毫秒
相比Mel倒谱系数(MFCC),基于能量偏差移除和幂函数的声音特征(PNCC)具有较强的抗噪能力.首先,将PNCC和MFCC组成混合特征矩阵,在隐马尔科夫模型(HMM)、高斯混合模型(GMM)和支持向量机(SVM)下对混合特征和传统特征做对比实验.其次,先选取实验结果较好的HMM模型过滤测试样本,再分别选取GMM和SVM做二次分类,并测试两种双层模型的识别正确率.结果表明在噪声环境下使用HMM/GMM双层模型和混合特征可取得较好的识别效果.  相似文献   

针对现行异常声音识别算法复杂度高和特征识别率低的问题,将梅尔频率倒谱系数(MFCC)与短时能量混合特征应用到异常声音识别系统中。该混合特征使得高斯混合模型(GMM)分类器可获得比使用MFCC特征及其差分MFCC更好的分类性能。给出了系统实现的具体步骤,并通过仿真实验证明了该算法的有效性,分类器的平均识别率可达到90%以上,并且计算复杂度小。  相似文献   

张赛花  赵兆  许志勇  张怡 《计算机应用》2017,37(4):1111-1115
针对自然复杂声学环境下基于鸟鸣的物种分类问题,提出了一种基于Mel子带参数化特征的鸟鸣自动识别方法。采用高斯混合模型(GMM)拟合连续声学监测数据分帧后的对数能量分布,选取高似然率的数据帧组成候选声音事件完成自动分段。在谱图域对相应片段采用Mel带通滤波器组滤波处理,然后基于自回归模型(AR)分别建模各个子带输出的随时间变化的能量序列,得到能够描述不同种类鸟鸣信号时频特性的参数化特征。最后利用支持向量机(SVM)分类器进行分类识别。基于野外自然环境11种鸟鸣信号开展了自动分段与识别实验,所提方法针对各类鸟鸣的查准率、查全率以及F1度量均不低于89%,明显优于现有基于纹理特征的方法,更适用于野外鸟类连续声学监测领域的自动数据分析需求。  相似文献   

基于SVM模型的自然环境声音的分类   总被引:1,自引:0,他引:1  
提出了一种基于支持向量机(SVM)模型对自然环境声音进行分类的方法。首先,提取Mel频率倒谱系数(MFCCs)来分析声音信号;其次,对自然环境的声音基于MFCC特征集建立SVM模型;最后,使用交叉验证的测试方法得到基于SVM算法的分类结果。使用SVM模型对50类自然环境中的声音进行分类的正确率可达99.5704%,分类效果明显优于K最近邻(KNN)和二分嵌套整合(END)这两种算法。  相似文献   

支持向量机作为强大的理论工具和计算工具,已成功地应用在模式识别的众多领域中。本文研究了将支持向量机模型(SVM)应用于语言辨识的理论框架,提出了将Louradour序列核应用于语言辨识,并利用高斯混合模型(GMM)构造全局背景模型(UBM)对其进行了改进,从而导出了基于SVM-UBM的语言辨识系统。相关实验结果表明,该系统的识别率高于经典的高斯混合模型(GMM)和基于广义线性区分性核(GLDS)的支持向量机模型。  相似文献   

汽车声音识别是汽车声源定位等研究的基础,对交通事故鉴定、犯罪举证和犯罪现场还原等具有重要意义。现有汽车声音识别算法存在算法复杂度高和识别率相对较低等问题。针对现行问题,将以梅尔倒谱系数( MFCC)特征与自相关函数(ACF)方差作为混合特征的汽车声音识别算法应用到汽车声音识别系统中。该算法使用高斯混合模型(GMM)进行汽车声音建模和识别,获得比MFCC特征及其一阶差分特征组成的混合特征更好的识别效果。并通过仿真实验证明了该算法的有效性。  相似文献   

陶志勇  刘晓芳  王和章 《计算机应用》2018,38(12):3433-3437
针对高斯混合模型(GMM)聚类算法对初始值敏感且容易陷入局部极小值的问题,利用密度峰值(DP)算法全局搜索能力强的优势,对GMM算法的初始聚类中心进行优化,提出了一种融合DP的GMM聚类算法(DP-GMMC)。首先,基于DP算法寻找聚类中心,得到混合模型的初始参数;其次,采用最大期望(EM)算法迭代估计混合模型的参数;最后,根据贝叶斯后验概率准则实现数据点的聚类。在Iris数据集下,DP-GMMC聚类准确率可达到96.67%,与传统GMM算法相比提高了33.6个百分点,解决了对初始聚类中心依赖的问题。实验结果表明,DP-GMMC对低维数据集有较好的聚类效果。  相似文献   

陈黎  徐东平 《计算机工程》2011,37(14):172-174
建立一种支持向量机-高斯混合模型(SVM-GMM),用以提高开集说话人识别的识别率。该模型的基本思想是将SVM的分类结果用GMM模型进行确认。由于SVM模型具有较好的分类性能,而GMM模型能够较好地描述类别内部的相似性,因此这2个模型的组合能够优势互补,从而获得较好的识别效果。实验结果表明,使用SVM-GMM模型能有效地提高开集说话人识别的识别率。  相似文献   

快速准确地检测出采集录音中的咳嗽部分对许多呼吸道疾病的临床诊断有着重要意义。使用梅尔频率倒谱系数(MFCC)作为特征参数来分析所要处理的声音信号,并用多组训练数据分别为采集录音中的咳嗽音、说话声、笑声、清喉音等数据各建立两个高斯混合模型(GMM),将每类数据得到的两个GMM进行线性组合得到最终的表示每类数据的概率模型,进而实现对咳嗽音部分的检测。在此基础上引入了小波去噪理论,分别对每段数据去噪并进行端点检测。仿真实验结果表明所提方法能够有效提高系统的识别性能。  相似文献   

针对传统虚拟机整合(VMC)方法难以保持主机工作负载长期稳定的问题,提出一种基于高斯混合模型的高效虚拟机整合(GMM-VMC)方法。为了准确地预测主机负载的变化趋势,首先,使用高斯混合模型(GMM)对活动物理主机的工作负载历史记录进行拟合;然后,根据活动物理主机工作负载的GMM和主机自身的资源配置情况计算主机的过载概率,并根据过载概率判定主机是否存在过载风险;对存在过载风险的物理主机,根据部署在该物理主机上的虚拟机对降低主机过载风险的贡献和虚拟机迁移所需的时间这两个指标进行待迁移虚拟机选择;最后,使用GMM估算待迁移虚拟机对各个目标主机过载风险的影响,并选择受影响最小的主机作为目标主机。通过CloudSim仿真平台模拟该GMM-VMC方法,并根据能源消耗、服务质量(QoS)、整合效率等指标与已有的整合方法进行对比,实验结果表明,GMM-VMC方法能够有效地降低数据中心能耗,提高服务质量。  相似文献   

基于萤火虫算法的匹配追踪用于生态声音辨识   总被引:1,自引:0,他引:1  
针对生态环境中背景噪声对声音辨识产生干扰的问题,提出利用萤火虫算法优化匹配追踪的方法进行生态声音辨识。利用匹配追踪(MP)稀疏分解声音信号,在保留信号主体结构的前提下对其进行重构,减小噪声的影响。使用萤火虫(GSO)算法优化搜索最佳匹配原子,实现MP快速分解。对重构信号提取Mel频率倒谱系数(MFCCs),MP时频特征及基音频率。结合支持向量机(SVM)对56种生态声音在不同环境和信噪比情况下进行分类识别。实验结果表明,与传统MFCC与SVM的方法相比,该方法对生态声音在不同信噪比下的识别性能得到不同程度的改善并且具有较好的抗噪性,尤其适合低信噪比(30 dB以下)噪声情境下使用。  相似文献   

基于SVM的非特定人声调识别的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
在建立非特定人普通话四声语调语音数据库的基础上,采用Mel频率倒谱系数(MFCCs)对语音数据进行特征参数的提取,并利用支持向量机(SVM)对语音中的四种声调进行了训练和识别研究。实验结果表明MFCCs和SVM的结合得到的平均识别率达到了97.6%。  相似文献   

The mel-frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel- or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We, therefore, incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS, and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.  相似文献   

This paper presents a new fingerprint recognition method based on mel-frequency cepstral coefficients (MFCCs). In this method, cepstral features are extracted from a group of fingerprint images, which are transformed first to 1-D signals by lexicographic ordering. MFCCs and polynomial shape coefficients are extracted from these 1-D signals or their transforms to generate a database of features, which can be used to train a neural network. The fingerprint recognition can be performed by extracting features from any new fingerprint image with the same method used in the training phase. These features are tested with the neural network. The different domains are tested and compared for efficient feature extraction from the lexicographically ordered 1-D signals. Experimental results show the success of the proposed cepstral method for fingerprint recognition at low as well as high signal to noise ratios (SNRs). Results also show that the discrete cosine transform (DCT) is the most appropriate domain for feature extraction.  相似文献   

基于分形布朗运动和Ada Boosting的多类音频例子识别   总被引:2,自引:0,他引:2  
提出了一种基于分形布朗运动的音频特征提取和识别方法.这种方法使用分形布朗运动模型计算出音频例子的分形维数,并作为其分形特征.针对音频分形特征符合高斯分布的特点,使用Ada Boosting算法进行特征约减.然后分别使用Ada-加权高斯分类器和支持向量机对约减特征后的音频分类,并在两类分类的基础上构造多类分类的模型.实验表明,经过特征约减后的音频分形特征在音乐和语音的分类中都优于其他音频特征.  相似文献   

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment.  相似文献   

Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.  相似文献   

The speaker recognition has been one of the interesting issues in signal and speech processing over the last few decades. Feature selection is one of the main parts of speaker recognition system which can improve the performance of the system. In this paper, we have proposed two methods to find MFCCs feature vectors with the highest similar that is applied to text independent speaker identification system. These feature vectors show individual properties of each person’s vocal tract that are mostly repeated. They are used to build speaker’s model and to specify decision boundary. We applied MFCC of each window over main signal as a feature vector and used clustering to obtain feature vectors with the highest similar. The Speaker identification experiments are performed using the ELSDSR database that consists of 22 speakers (12 male and 10 female) and Neural Network is used as a classifier. The effect of three main parameters have been considered in two proposed methods. Experimental results indicate that the performance of speaker identification system has been improved in accuracy and time consumption term.  相似文献   


This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号