首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪;实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。  相似文献   

2.
尹许梅  何选森 《计算机工程》2011,37(11):192-194
为提高低信噪比环境下语音的鲁棒性,提出一种改进的Mel频率倒谱系数(MFCC)特征提取方法。在传统MFCC特征提取的基础上,引入更适应人耳听觉系统的Bark子波变换,在快速傅里叶变换之前对语音进行预处理,并在MFCC提取方法中代替离散余弦变换;在语音预处理阶段,利用改进的Lanczos窗函数抑制旁瓣以提高语音鲁棒性。实验表明,与传统MFCC方法相比,在噪声环境下,改进方法具有更高的说话人识别率。  相似文献   

3.
为提高说话人识别中语音特征参数对噪声的鲁棒性,本文提出在对语音进行小波包分解基础上,分析噪声的特性,在不同子带内进行谱减并设立权重,提出了一种新的语音特征参数多层美尔倒谱系数.仿真实验表明,与MFCC特征参数相比,ML-MFCC在噪声环境下具有更好的抗噪性能和说话人识别率.  相似文献   

4.
语音倒谱特征的研究   总被引:24,自引:1,他引:24  
语音倒谱特征是语音识别中最常用的特征参数,它表征了人类的听觉特征。该文在研究基于线性预测倒谱和非线性MEL刻度倒谱特征的基础上,研究了LPCC和MFCC参数提取的算法原理及提取算法,提出了一级、二级差分倒谱特征参数的提取算法。识别实验验证了MFCC参数的鲁棒性优于LPCC参数。  相似文献   

5.
基于加权Mel倒谱系数的说话人识别   总被引:2,自引:0,他引:2  
说话人识别中的首要问题是从语音信号中提取能唯一表现说话人个性特征的有效而稳定可靠的特征参数.把感知加权技术应用到Mel倒谱分析中,通过对基于心理声学模型计算得到的信号掩蔽比插值获得权重函数,并将权重函数应用到Mel倒谱分析中获得加权Mel倒谱系数(WMCEP),以此为特征进行说话人识别.实验结果表明,WMCEP比MFCC和Mel倒谱系数(MCEP)能更好地逼近说话人的谱包络,在噪声环境下的鲁棒性更好,因此其识别性能要优于MFCC和MCEP.  相似文献   

6.
为减少维纳滤波在语音增强中残留的"音乐噪声",将多窗谱估计和改进的维纳滤波方法结合,并进行语音合成。设计了基于多窗谱估计的改进维纳滤波语音增强方法,该方法采用多窗谱估计噪声功率谱,改进维纳滤波降噪得到增强语音,以及重叠相加语音合成,并给出仿真对比验证。结果表明,基于多窗谱估计的改进维纳滤波方法在抑制噪声,减少音乐噪声方面优于基于维纳滤波的增强算法和基于多窗谱估计的改进谱减法的增强算法。  相似文献   

7.
在噪声环境中,基于笔记本电脑录音的情况下,采用特征参数加窗的方法,以提高系统的噪声鲁棒性.在 Matlab 环境下,建立了基于高斯混合模型 (GMM) 的说话人辨认系统,并进行实验.通过对多种窗口的正识率比较,发现对美尔倒谱(MFCC)高阶参数的加窗提升,可以改善系统的鲁棒性.实验结果表明,采用加窗后的系统识别率得到了明显改善.  相似文献   

8.
语音信号窗函数具有减少频谱能量泄露的作用,针对传统的语音加窗函数旁瓣衰减速度慢,信号频谱能量泄露大,不利于说话人识别特征参数提取的缺点,采用一种汉明自卷积窗函数取代汉明窗函数对语音信号预处理.为了进一步提高说话人系统的识别率,文章提出一种基于汉明自卷积窗的的一阶、二阶差分梅尔倒谱系数(MFCC)改进的动态组合特征参数方法.用高斯混合模型进行仿真实验,实验结果证明,用该方法提取的特征参数运用于说话人识别系统,相比于传统的MFCC说话人识别系统,其识别率大大提高.  相似文献   

9.
为了提高低信噪比下说话人识别系统的性能,提出一种Gammatone滤波器组与改进谱减法的语音增强相结合的说话人识别算法。将改进的谱减法作为预处理器,进一步提高语音信号的信噪比,再通过Gammatone滤波器组,对增强后的说话人语音信号进行处理,提取说话人语音信号的特征参数GFCC,进而将特征参数GFCC用于说话人识别算法中。仿真实验在高斯混合模型识别系统中进行。实验结果表明,采用这种算法应用于说话人识别系统,系统的识别率及鲁棒性都有明显的提高。  相似文献   

10.
针对梅尔频率倒谱系数(MFCC)参数在噪声环境中语音识别率下降的问题,提出了一种基于耳蜗倒谱系数(CFCC)的改进的特征参数提取方法.提取具有听觉特性的CFCC特征参数;运用改进的线性判别分析(LDA)算法对提取出的特征参数进行线性变换,得到更具有区分性的特征参数和满足隐马尔可夫模型(HMM)需要的对角化协方差矩阵;进行均值方差归一化,得到最终的特征参数.实验结果表明:提出的方法能有效地提高噪声环境中语音识别系统的识别率和鲁棒性.  相似文献   

11.
Endpoint detection of speech has been shown prosperous for speech recognition and speech enhancement. But the traditional endpoint detection methods lose efficiency in either low signal-to-noise ratio (SNR) environments or nonstationary noise environments. To improve the accuracy of speech endpoint detection in low SNR environments, an endpoint detection method based on an adaptive algorithm for thresholds adjustment is put forward in this paper. The spectral subtraction of multitaper spectrum estimation is performed to enhance the speech. During the process of detection, the cepstral distance of Mel frequency cepstrum coefficient (MFCC) is utilized and the thresholds are adaptively adjusted to different environments. Simulation experiments indicate that in different noise environments with different SNRs, our algorithm has a better endpoint detection accuracy compared with other detection algorithms. Besides that, the algorithm also exhibits strong robustness in low SNR environments.  相似文献   

12.
In this paper, we provide a comparative study of spectral front-end features used as representations for speech signals by processing multitaper magnitude and phase spectra, for speaker verification with expressive speech. In particular, the multitaper modified group delay function (MT-MOGDF) and multitaper magnitude (MT-MAG) spectra of the speech signals are employed to obtain low variance estimates of speech spectra. We observe that the cues that aid in representation of expressive speech are evident in the MT-MOGDF spectrum than the MT-MAG spectrum in terms of mean Formant value and Formant bandwidth. Our extensive experimental study on a speaker verification system with a Gaussian mixture model based universal background model classifier on expressive speech using the IITKGP-SESC and EMODB databases show that MT-MOGDF performs better than MT-MAG technique, in terms of equal error rate and minimum decision cost function. This improvement due to MT-MOGDF is owed to a better representation and a low-variance estimate of the speech spectrum. Our results highlight the utility of MT-MOGDF as a potential alternative for MT-MAG representation for speaker verification problems in general.  相似文献   

13.
基于基音周期与清浊音信息的梅尔倒谱参数   总被引:1,自引:0,他引:1  
提出一种在浊音部分不固定帧长的梅尔倒谱参数(Mel-cepstrum)提取的方法。针对浊音和清音所包含信息量不同,对浊音进行双倍的加权,从而将基音与清浊音信息融合进梅尔倒谱参数。将这种动态的梅尔倒谱参数应用在说话人确认中,在混合高斯模型(Gaussian mixture models,GMM)的情况下,取得了比常用的梅尔刻度式倒频谱参数(Mel-frequency cepstral coefficient,MFCC)更高的识别率,在NIST 2002年测试数据库中,512个混合高斯下能够将等错误率(EER)由9.4%降低到8.3%,2 048个混合高斯下能够将等错误率由7.8%降低到6.9%。  相似文献   

14.
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.  相似文献   

15.
Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for –5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.  相似文献   

16.
针对语音系统受外界强噪声干扰而导致识别精度降低以及通信质量受损的问题,提出一种基于自适应噪声估计的语音增强方法。通过端点检测将语音信号分为语音段与非语音段,对这两种情况的噪声幅度谱分别进行自适应估计,并对谱减法中不具有通用性的假设进行研究从而改进原理公式。实验结果表明,相对于传统谱减法,该方法能更好地抑制音乐噪声,并保持较高清晰度和可懂度,提高了强噪声环境下的语音识别精度和通信质量。  相似文献   

17.
Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.  相似文献   

18.
陈迪  龚卫国  杨利平 《计算机应用》2007,27(5):1217-1219
提出了一种可用于改善说话人识别效果的基于基音周期的可变窗长语音MFCC参数提取方法。基本原理是将原始的语音分解为当前基音周期整数倍长度以内部分及其以外部分,并保留前者舍去后者,以减小训练语音与测试语音的频谱失真。通过文本无关的说话人确认实验,验证了该方法能有效提高说话人确认的识别率,并能提高短时语音的稳定性。  相似文献   

19.
罗元  孙龙 《计算机科学》2016,43(8):297-299, 317
为提高说话人确认系统在噪声环境下的鲁棒性,在利用听觉外周模型改进Mel频率倒谱系数(Mel FrequencyCepstral Coefficient,MFCC)的基础上,结合感知线性预测系数(Perceptual Linear Predictive Coefficient,PLPC),以类间区分度为依据,在特征域对两种声纹特征进行融合,提出一种新的声纹特征提取方法,并对基于该特征的说话人确认系统的噪声鲁棒性进行研究。针对不同信噪比的语音信号进行了融合特征与原始特征的对比实验,结果表明,融合特征在模拟餐厅噪声环境中的错误率更低,较MFCC与PLPC分别降低了2.2%和3.1%,说话人确认系统在噪声中的鲁棒性得到提升。  相似文献   

20.
洪晓芬 《计算机工程与设计》2007,28(22):5453-5454,5477
语音增强技术是解决噪声污染的一项强有力的预处理技术.谱减法通过处理后的语音中会留下所谓的"音乐噪声",针对这个问题,提出了一种多带谱相减与感觉加权相结合的语音增强方法.对带噪语音进行多带谱相减,并根据人的听觉掩蔽特性,对多带谱相减后的信号进行感觉加权,从而进一步降低背景噪声.在语音失真和噪声抑制之间取得良好的折中,减少语音的听觉失真,有效地抑制"音乐噪声",提高语音的清晰度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号