共查询到19条相似文献,搜索用时 93 毫秒
1.
提出了一种结合Mellin变换和Mel频率分析的语音信号特征--MMCC特征.该特征利用Mellin变换的尺度不变性质,抑制了特征参数受不同说话人声道变化的影响,同时结合Mel频率的人耳听觉特性,改善了特征的鲁棒性,适合于非特定人识别系统的应用.仿真结果表明,采用MMCC特征的非特定人语音识别系统,其识别效果优于采用LPCC特征、MFCC特征和MMTLS特征的非特定人语音识别系统. 相似文献
2.
3.
在噪声环境下能准确有效地提取语音信息是语音识别的重点难点,将其应用于嵌入式系统中,有一定的研究意义.通过比较分析传统的语音特征参数提取的方法:线性预测倒谱系数,Mel频率倒谱系数,提出了一种新的方法,采用Mel频率倒谱系数与一阶差分Mel频率倒谱系数(MFCC+ A MFCC)相结合的方法提取语音特征参数,结合双门限检测法进行端点检测和HMM模型进行模型匹配,并进行了以ARMSX2410为核心硬件与软件的系统设计.该方法较传统方法提高了系统的鲁棒性、识别的准确率和系统效率,适用于噪声环境下的语音识别. 相似文献
4.
音频特征提取是音频分类的基础,好的特征将会有效提高分类精度。在提取频域特征Mel频率倒谱系数(MFCC)的同时,对每一帧信号做离散小波变换,提取小波域特征,把频域和小波域特征相结合计算其统计特征。通过SVM模型建立音频模板,对纯语音、音乐及带背景音乐的语音进行分类识别,取得了较高的识别精度。 相似文献
5.
6.
7.
人在不同情感下的语音信号其非平稳性尤为明显,传统的MFCC只能反映语音信号的静态特征,经验模态分解能够精细地刻画语音信号的非平稳特性。为提取情感语音的非平稳特征,用经验模态分解将情感语音信号分解为一系列固有模态函数分量,通过Mel滤波器后取其对数能量,进行DCT反变换后得到改进的MFCC作为情感识别的新特征,采用支持向量机对高兴、生气、厌烦和恐惧等四种语音情感识别。仿真实验结果表明:改进的MFCC识别率达到77.17%,在不同的信噪比下,识别率最大可提高3.26%。 相似文献
8.
基于知识的声目标探测识别系统 总被引:1,自引:0,他引:1
被动声目标探测广泛应用于战场目标识别或自动设备的故障探测;通过对声目标的短时信号处理,使用现场可编程门阵列器件的可重构技术对声目标探测识别系统设计;提出通过提取子带Mel倒谱系数(MFCC)参数特征构建声目标信息的知识库,并使用0阶Mel倒谱系数(MFCC0)进行频谱能量分析,找寻信号起止端点,将声目标Mel倒谱系数(MFCC)特征参数映射为二值图像进行模板匹配识别;将声目标识别输出的控制指令传送给工控机或直接输出控制相关的智能系统,实现战场声目标识别或自动设备的声故障探测. 相似文献
9.
10.
语音MFCC特征计算的改进算法 总被引:1,自引:0,他引:1
提出了一种计算Mel频倒谱参数(Mel frequency cepstral coefficient,MFCC)特征的改进算法,该算法采用了加权滤波器分析(Wrapped discrete Fourier transform,WDFT)技术来提高语音信号低频部分的频谱分辨率,使之更符合人类听觉系统的特性。同时还运用了加权滤波器分析(Weighted filter bank analysis,WFBA)技术,以提高MFCC的鲁棒性。对TIMIT连续语音数据库中DR1集的音素识别结果表明,本文提出的改进算法比传统MFCC算法具有更好的识别率。 相似文献
11.
基于语音信号的频谱特性,本文对说话人识别技术中Mel倒谱参数做了改进,并通过Microsoft Visual C 6.0验证了在低信噪比时使用改进后的Mel倒谱参数可以提高说话人识别系统的正确识别率. 相似文献
12.
一种适用于说话人识别的改进Mel滤波器 总被引:1,自引:0,他引:1
Mel倒谱系数(MFcc)侧重提取语音信号的低频信息,对语音信号的频谱分布特性描述不充分,不能有效区分说话人个性信息。为此,通过分析语音信号各频段所含说话人个性信息的不同,结合Mel滤波器和反Mel滤波器在高低频段的不同特性,提出一种适于说话人识别的改进Mel滤波器。实验结果表明,改进Mel滤波器提取的新特征能够获得比传统Mel倒谱系数以及反Mel倒谱系数(IMFCC)更好的识别效果,并且基本不增加说话人识别系统训练和识别的时间开销。 相似文献
13.
Automatic recognition of the speech of children is a challenging topic in computer-based speech recognition systems. Conventional feature extraction method namely Mel-frequency cepstral coefficient (MFCC) is not efficient for children's speech recognition. This paper proposes a novel fuzzy-based discriminative feature representation to address the recognition of Malay vowels uttered by children. Considering the age-dependent variational acoustical speech parameters, performance of the automatic speech recognition (ASR) systems degrades in recognition of children's speech. To solve this problem, this study addresses representation of relevant and discriminative features for children's speech recognition. The addressed methods include extraction of MFCC with narrower filter bank followed by a fuzzy-based feature selection method. The proposed feature selection provides relevant, discriminative, and complementary features. For this purpose, conflicting objective functions for measuring the goodness of the features have to be fulfilled. To this end, fuzzy formulation of the problem and fuzzy aggregation of the objectives are used to address uncertainties involved with the problem.The proposed method can diminish the dimensionality without compromising the speech recognition rate. To assess the capability of the proposed method, the study analyzed six Malay vowels from the recording of 360 children, ages 7 to 12. Upon extracting the features, two well-known classification methods, namely, MLP and HMM, were employed for the speech recognition task. Optimal parameter adjustment was performed for each classifier to adapt them for the experiments. The experiments were conducted based on a speaker-independent manner. The proposed method performed better than the conventional MFCC and a number of conventional feature selection methods in the children speech recognition task. The fuzzy-based feature selection allowed the flexible selection of the MFCCs with the best discriminative ability to enhance the difference between the vowel classes. 相似文献
14.
In a recent study, we have introduced the problem of identifying cell-phones using recorded speech and shown that speech signals convey information about the source device, making it possible to identify the source with some accuracy. In this paper, we consider recognizing source cell-phone microphones using non-speech segments of recorded speech. Taking an information-theoretic approach, we use Gaussian Mixture Model (GMM) trained with maximum mutual information (MMI) to represent device-specific features. Experimental results using Mel-frequency and linear frequency cepstral coefficients (MFCC and LFCC) show that features extracted from the non-speech segments of speech contain higher mutual information and yield higher recognition rates than those from speech portions or the whole utterance. Identification rate improves from 96.42% to 98.39% and equal error rate (EER) reduces from 1.20% to 0.47% when non-speech parts are used to extract features. Recognition results are provided with classical GMM trained both with maximum likelihood (ML) and maximum mutual information (MMI) criteria, as well as support vector machines (SVMs). Identification under additive noise case is also considered and it is shown that identification rates reduces dramatically in case of additive noise. 相似文献
15.
基于MFCCs滤波的电话语音识别的通道补偿方法 总被引:4,自引:0,他引:4
本文提出一种基于MFCCs滤波的通道补偿方法RMFCC。它具有性能良好和运算简单的优点,在不失精度的前题下减少了计算代价。RMFCC的性能也优于CMS和二级CMS。通过讨论发现许多抑制通道噪声的方法从本质上说都是采用滤波的方法,我们也证实了抑制非常低的调制频率是进行顽健的电话语音识别的有效途径。 相似文献
16.
This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions.
Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed
speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise.
In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to
changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech
conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients,
linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area
ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed
condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition
results with different speech features. This analysis help select the best feature set for speaker recognition under stressed
condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition
under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation
by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude
(CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question
and Neutral, are used for analysis of speaker recognition under stressed condition. 相似文献
17.
18.
情感特征的提取是语音情感识别的重要方面。由于传统信号处理方法的局限,使得提取的传统声学特征特别是频域特征并不准确,不能很好地表征语音的情感特性,因而对情感识别率不高。利用希尔伯特黄变换(HHT)对情感语音进行处理,得到情感语音的希尔伯特边际能量谱;通过对不同情感语音的边际能量谱基于Mel尺度的比较分析,提出了一组新的情感特征:Mel频率边际能量系数(MFEC)、Mel频率子带频谱质心(MSSC)、Mel频率子带频谱平坦度(MSSF);利用支持向量机(SVM)对5种情感语音即悲伤、高兴、厌倦、愤怒和平静进行了识别。实验结果表明,通过该方法提取的新的情感特征具有较好的识别效果。 相似文献
19.
Speaker recognition is a major challenge in various languages for researchers. For programmed speaker recognition structure prepared by utilizing ordinary speech, shouting creates a confusion between the enlistment and test, henceforth minimizing the identification execution as extreme vocal exertion is required during shouting. Speaker recognition requires more time for classification of data, accuracy is optimized, and the low root-mean-square error rate is the major problem. The objective of this work is to develop an efficient system of speaker recognition. In this work, an improved method of Wiener filter algorithm is applied for better noise reduction. To obtain the essential feature vector values, Mel-frequency cepstral coefficient feature extraction method is used on the noise-removed signals. Furthermore, input samples are created by using these extracted features after the dimensions have been reduced using probabilistic principal component analysis. Finally, recurrent neural network-bidirectional long-short-term memory is used for the classification to improve the prediction accuracy. For checking the effectiveness, the proposed work is compared with the existing methods based on accuracy, sensitivity, and error rate. The results obtained with the proposed method demonstrate an accuracy of 95.77%. 相似文献