首页 | 官方网站   微博 | 高级检索  
 共查询到19条相似文献,搜索用时 93 毫秒
提出了一种结合Mellin变换和Mel频率分析的语音信号特征--MMCC特征.该特征利用Mellin变换的尺度不变性质,抑制了特征参数受不同说话人声道变化的影响,同时结合Mel频率的人耳听觉特性,改善了特征的鲁棒性,适合于非特定人识别系统的应用.仿真结果表明,采用MMCC特征的非特定人语音识别系统,其识别效果优于采用LPCC特征、MFCC特征和MMTLS特征的非特定人语音识别系统.  相似文献   

林鑫  陈桦  王开志  王继成 《计算机工程》2007,33(17):237-238
定义了10种基本的嘴形。以Mel频率倒谱系数(MFCC)作为语音特征,通过SVM分类器进行元音a,i,u的识别,根据其对应量化后的语音能量,映射到嘴形序列,进行中值滤波和排除“奇异点”。该算法在基于语音驱动人脸动画系统中的应用取得了良好的效果。  相似文献   

在噪声环境下能准确有效地提取语音信息是语音识别的重点难点,将其应用于嵌入式系统中,有一定的研究意义.通过比较分析传统的语音特征参数提取的方法:线性预测倒谱系数,Mel频率倒谱系数,提出了一种新的方法,采用Mel频率倒谱系数与一阶差分Mel频率倒谱系数(MFCC+ A MFCC)相结合的方法提取语音特征参数,结合双门限检测法进行端点检测和HMM模型进行模型匹配,并进行了以ARMSX2410为核心硬件与软件的系统设计.该方法较传统方法提高了系统的鲁棒性、识别的准确率和系统效率,适用于噪声环境下的语音识别.  相似文献   

有效的基于内容的音频特征提取方法   总被引:1,自引:1,他引:0       下载免费PDF全文
音频特征提取是音频分类的基础,好的特征将会有效提高分类精度。在提取频域特征Mel频率倒谱系数(MFCC)的同时,对每一帧信号做离散小波变换,提取小波域特征,把频域和小波域特征相结合计算其统计特征。通过SVM模型建立音频模板,对纯语音、音乐及带背景音乐的语音进行分类识别,取得了较高的识别精度。  相似文献   

利用高斯混合模型(GMM)方法进行语音的性别识别.首先概述了特征提取、识别方法及性别识别的过程;然后通过减少提取特征的语音帧数和降低高斯混合模型的混合阶数来提高性别识别速度;最后,将由Mel频率倒谱参数(MFCC)特征和基音频率特征两种方法得到的测试样本后验概率结合,提出新的计算测试样本后验概率的方法.实验表明依据此后验概率能有效提高识别的正确率.  相似文献   

语音识别中DTW改进算法的研究   总被引:1,自引:0,他引:1  
动态时间规整DTW是语音识别中的一种经典算法。对此算法提出了一种改进的端点检测算法,特征提取采用了Mel频率倒谱系数MFCC,并采用计算量相对较小的改进的动态时间规整算法实现语音参数模板匹配,能够实现孤立词、特定人、小词汇量的语音识别,并用Matlab进行了算法仿真。试验结果表明,改进后的算法能够有效地提高系统对语音的识别率。  相似文献   

人在不同情感下的语音信号其非平稳性尤为明显,传统的MFCC只能反映语音信号的静态特征,经验模态分解能够精细地刻画语音信号的非平稳特性。为提取情感语音的非平稳特征,用经验模态分解将情感语音信号分解为一系列固有模态函数分量,通过Mel滤波器后取其对数能量,进行DCT反变换后得到改进的MFCC作为情感识别的新特征,采用支持向量机对高兴、生气、厌烦和恐惧等四种语音情感识别。仿真实验结果表明:改进的MFCC识别率达到77.17%,在不同的信噪比下,识别率最大可提高3.26%。  相似文献   

基于知识的声目标探测识别系统   总被引:1,自引:0,他引:1  
被动声目标探测广泛应用于战场目标识别或自动设备的故障探测;通过对声目标的短时信号处理,使用现场可编程门阵列器件的可重构技术对声目标探测识别系统设计;提出通过提取子带Mel倒谱系数(MFCC)参数特征构建声目标信息的知识库,并使用0阶Mel倒谱系数(MFCC0)进行频谱能量分析,找寻信号起止端点,将声目标Mel倒谱系数(MFCC)特征参数映射为二值图像进行模板匹配识别;将声目标识别输出的控制指令传送给工控机或直接输出控制相关的智能系统,实现战场声目标识别或自动设备的声故障探测.  相似文献   

提出一种基于样本熵与Mel频率倒谱系数(MFCC)融合的语音情感识别方法。利用支持向量机分别对样本熵统计量与MFCC进行处理,计算其属于高兴、生气、厌烦和恐惧4种情感的概率,采用加法规则和乘法规则对情感概率进行融合,得到识别结果。仿真实验结果表明,该方法的识别率较高。  相似文献   

语音MFCC特征计算的改进算法   总被引:1,自引:0,他引:1  
提出了一种计算Mel频倒谱参数(Mel frequency cepstral coefficient,MFCC)特征的改进算法,该算法采用了加权滤波器分析(Wrapped discrete Fourier transform,WDFT)技术来提高语音信号低频部分的频谱分辨率,使之更符合人类听觉系统的特性。同时还运用了加权滤波器分析(Weighted filter bank analysis,WFBA)技术,以提高MFCC的鲁棒性。对TIMIT连续语音数据库中DR1集的音素识别结果表明,本文提出的改进算法比传统MFCC算法具有更好的识别率。  相似文献   

基于语音信号的频谱特性,本文对说话人识别技术中Mel倒谱参数做了改进,并通过Microsoft Visual C 6.0验证了在低信噪比时使用改进后的Mel倒谱参数可以提高说话人识别系统的正确识别率.  相似文献   

一种适用于说话人识别的改进Mel滤波器   总被引:1,自引:0,他引:1  
项要杰  杨俊安  李晋徽  陆俊 《计算机工程》2013,(11):214-217,222
Mel倒谱系数(MFcc)侧重提取语音信号的低频信息,对语音信号的频谱分布特性描述不充分,不能有效区分说话人个性信息。为此,通过分析语音信号各频段所含说话人个性信息的不同,结合Mel滤波器和反Mel滤波器在高低频段的不同特性,提出一种适于说话人识别的改进Mel滤波器。实验结果表明,改进Mel滤波器提取的新特征能够获得比传统Mel倒谱系数以及反Mel倒谱系数(IMFCC)更好的识别效果,并且基本不增加说话人识别系统训练和识别的时间开销。  相似文献   

Automatic recognition of the speech of children is a challenging topic in computer-based speech recognition systems. Conventional feature extraction method namely Mel-frequency cepstral coefficient (MFCC) is not efficient for children's speech recognition. This paper proposes a novel fuzzy-based discriminative feature representation to address the recognition of Malay vowels uttered by children. Considering the age-dependent variational acoustical speech parameters, performance of the automatic speech recognition (ASR) systems degrades in recognition of children's speech. To solve this problem, this study addresses representation of relevant and discriminative features for children's speech recognition. The addressed methods include extraction of MFCC with narrower filter bank followed by a fuzzy-based feature selection method. The proposed feature selection provides relevant, discriminative, and complementary features. For this purpose, conflicting objective functions for measuring the goodness of the features have to be fulfilled. To this end, fuzzy formulation of the problem and fuzzy aggregation of the objectives are used to address uncertainties involved with the problem.The proposed method can diminish the dimensionality without compromising the speech recognition rate. To assess the capability of the proposed method, the study analyzed six Malay vowels from the recording of 360 children, ages 7 to 12. Upon extracting the features, two well-known classification methods, namely, MLP and HMM, were employed for the speech recognition task. Optimal parameter adjustment was performed for each classifier to adapt them for the experiments. The experiments were conducted based on a speaker-independent manner. The proposed method performed better than the conventional MFCC and a number of conventional feature selection methods in the children speech recognition task. The fuzzy-based feature selection allowed the flexible selection of the MFCCs with the best discriminative ability to enhance the difference between the vowel classes.  相似文献   

In a recent study, we have introduced the problem of identifying cell-phones using recorded speech and shown that speech signals convey information about the source device, making it possible to identify the source with some accuracy. In this paper, we consider recognizing source cell-phone microphones using non-speech segments of recorded speech. Taking an information-theoretic approach, we use Gaussian Mixture Model (GMM) trained with maximum mutual information (MMI) to represent device-specific features. Experimental results using Mel-frequency and linear frequency cepstral coefficients (MFCC and LFCC) show that features extracted from the non-speech segments of speech contain higher mutual information and yield higher recognition rates than those from speech portions or the whole utterance. Identification rate improves from 96.42% to 98.39% and equal error rate (EER) reduces from 1.20% to 0.47% when non-speech parts are used to extract features. Recognition results are provided with classical GMM trained both with maximum likelihood (ML) and maximum mutual information (MMI) criteria, as well as support vector machines (SVMs). Identification under additive noise case is also considered and it is shown that identification rates reduces dramatically in case of additive noise.  相似文献   

基于MFCCs滤波的电话语音识别的通道补偿方法   总被引:4,自引:0,他引:4  
韩纪庆  高文 《计算机学报》1998,21(12):1125-1130
本文提出一种基于MFCCs滤波的通道补偿方法RMFCC。它具有性能良好和运算简单的优点,在不失精度的前题下减少了计算代价。RMFCC的性能也优于CMS和二级CMS。通过讨论发现许多抑制通道噪声的方法从本质上说都是采用滤波的方法,我们也证实了抑制非常低的调制频率是进行顽健的电话语音识别的有效途径。  相似文献   

This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions. Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise. In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients, linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition results with different speech features. This analysis help select the best feature set for speaker recognition under stressed condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude (CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question and Neutral, are used for analysis of speaker recognition under stressed condition.  相似文献   

通过选用德州仪器公司带浮点功能的TMS320C6713DSP芯片作为系统核心处理器,结合MSP430单片机作为外围控制器,给出了一种实时语音识别系统的设计方法。该系统核心算法采用美尔频率倒谱系数作为特征参数进行特征提取和动态时间规整(DTW)算法进行模式匹配。通过编程调试,该系统具有良好的灵活性和实时性,在抗噪声、鲁棒性和识别率等方面有明显的提高。该系统在许多领域可作为实用化的一种参考。  相似文献   

情感特征的提取是语音情感识别的重要方面。由于传统信号处理方法的局限,使得提取的传统声学特征特别是频域特征并不准确,不能很好地表征语音的情感特性,因而对情感识别率不高。利用希尔伯特黄变换(HHT)对情感语音进行处理,得到情感语音的希尔伯特边际能量谱;通过对不同情感语音的边际能量谱基于Mel尺度的比较分析,提出了一组新的情感特征:Mel频率边际能量系数(MFEC)、Mel频率子带频谱质心(MSSC)、Mel频率子带频谱平坦度(MSSF);利用支持向量机(SVM)对5种情感语音即悲伤、高兴、厌倦、愤怒和平静进行了识别。实验结果表明,通过该方法提取的新的情感特征具有较好的识别效果。  相似文献   

Speaker recognition is a major challenge in various languages for researchers. For programmed speaker recognition structure prepared by utilizing ordinary speech, shouting creates a confusion between the enlistment and test, henceforth minimizing the identification execution as extreme vocal exertion is required during shouting. Speaker recognition requires more time for classification of data, accuracy is optimized, and the low root-mean-square error rate is the major problem. The objective of this work is to develop an efficient system of speaker recognition. In this work, an improved method of Wiener filter algorithm is applied for better noise reduction. To obtain the essential feature vector values, Mel-frequency cepstral coefficient feature extraction method is used on the noise-removed signals. Furthermore, input samples are created by using these extracted features after the dimensions have been reduced using probabilistic principal component analysis. Finally, recurrent neural network-bidirectional long-short-term memory is used for the classification to improve the prediction accuracy. For checking the effectiveness, the proposed work is compared with the existing methods based on accuracy, sensitivity, and error rate. The results obtained with the proposed method demonstrate an accuracy of 95.77%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号