期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

高会贤马全福郑晓势《计算机应用》2010,30(10):2712-2714

为了使说话人识别系统在语音较短和存在噪声的环境下也具有较高的识别率,基于矢量量化识别算法,对提取的特征参数进行研究。把小波变换与美尔频率倒谱系数(MFCC)的提取相结合,并将改进后的特征与谱质心特征进行了组合,建立了一种美尔频率小波变换系数+谱质心(MFWTC+SC)的新的组合特征参数。经实验表明,该组合特征可以有效地提高说话人识别系统的性能。相似文献

2.

基于小波包分解和噪声分析的抗噪说话人识别特征参数

吴峰燕李志华《计算机与现代化》2009,(1)

为提高说话人识别中语音特征参数对噪声的鲁棒性,本文提出在对语音进行小波包分解基础上,分析噪声的特性,在不同子带内进行谱减并设立权重,提出了一种新的语音特征参数多层美尔倒谱系数.仿真实验表明,与MFCC特征参数相比,ML-MFCC在噪声环境下具有更好的抗噪性能和说话人识别率. 相似文献

3.

融合LPC与MFCC的特征参数 总被引：1，自引：1，他引：1

下载免费PDF全文

张学锋王芳夏萍《计算机工程》2011,37(4):216-217

在线性预测系数(LPC)的基础上,借鉴美尔倒谱系数(MFCC)计算方法,对LPC进行美尔倒谱计算,得到一种新的特征参数：线性预测美尔倒谱系数(LPMFCC)。在Matlab7.0平台上实现一个基于隐马尔可夫模型(HMM)的说话人识别系统,分别用LPMFCC及其一阶差分、MFCC及其一阶差分和基于小波包分析的特征参数(WPDC)及其一阶差分作为识别参数进行对比实验。结果表明,以LPMFCC作为特征参数的系统具有较高的识别率。相似文献

4.

基于发声机理与人耳感知特性的说话人识别

杜晓青于凤芹《计算机工程》2013,(11):197-199,204

Mel频率倒谱系数（MFCC）与线性预测倒谱系数（LPCC）融合算法只能反映语音静态特征,且LPCC对语音低频局部特征描述不足。为此,提出将希尔伯特黄变换（HHT）倒谱系数与相对光谱一感知线性预测倒谱系数（RASTA—PLPCC）融合,得到一种既反映发声机理又体现人耳感知特性的说话人识别算法。HHT倒谱系数体现发声机理,能反映语音动态特性,并更好地描述信号低频局部特征,可改进LPCC的不足。PLPCC体现人耳感知特性,识别性能强于MFCC,用3种融合算法对两者进行融合,将融合特征用于高斯混合模型进行说话人识别。仿真实验结果表明,该融合算法较已有的MFCC与LPCC融合算法识别率提高了8．0％。相似文献

5.

说话人识别中基于Fisher比的特征组合方法

谢小娟曾以成熊冰峰《计算机应用》2016,36(5):1421-1425

为了提高说话人识别的准确率,可以同时采用多个特征参数,针对综合特征参数中各维分量对识别结果的影响可能不一样,同等对待并不一定是最优的方案这个问题,提出基于Fisher准则的梅尔频率倒谱系数(MFCC)、线性预测梅尔倒谱系数(LPMFCC)、Teager能量算子倒谱参数(TEOCC)相混合的特征参数提取方法。首先,提取语音信号的MFCC、LPMFCC和TEOCC三种参数;然后,计算MFCC和LPMFCC参数中各维分量的Fisher比,分别选出六个Fisher比高的分量与TEOCC参数组合成混合特征参数;最后,采用TIMIT语音库和NOISEX-92噪声库进行说话人识别实验。仿真实验表明,所提方法与MFCC、LPMFCC、MFCC+LPMFCC、基于Fisher比的梅尔倒谱系数混合特征提取方法以及基于主成分分析(PCA)的特征抽取方法相比,在采用高斯混合模型(GMM)和BP神经网络的平均识别率在纯净语音环境下分别提高了21.65个百分点、18.39个百分点、15.61个百分点、15.01个百分点与22.70个百分点;在30 dB噪声环境下,则分别提升了15.15个百分点、10.81个百分点、8.69个百分点、7.64个百分点与17.76个百分点。实验结果表明,该混合特征参数能够有效提高说话人识别率,且具有更好的鲁棒性。相似文献

6.

基于Fisher比的梅尔倒谱系数混合特征提取方法

《计算机应用》2014,(2)

针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。相似文献

7.

一种改进的语音动态组合特征参数提取方法

钟浩鲍鸿张晶《电脑与信息技术》2017,25(3)

语音信号窗函数具有减少频谱能量泄露的作用,针对传统的语音加窗函数旁瓣衰减速度慢,信号频谱能量泄露大,不利于说话人识别特征参数提取的缺点,采用一种汉明自卷积窗函数取代汉明窗函数对语音信号预处理.为了进一步提高说话人系统的识别率,文章提出一种基于汉明自卷积窗的的一阶、二阶差分梅尔倒谱系数(MFCC)改进的动态组合特征参数方法.用高斯混合模型进行仿真实验,实验结果证明,用该方法提取的特征参数运用于说话人识别系统,相比于传统的MFCC说话人识别系统,其识别率大大提高. 相似文献

8.

基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法

胡峰松张璇《计算机应用》2012,32(9):2542-2544

为提高说话人识别系统的识别率,提出了基于梅尔频率倒谱系数(MFCC)与翻转梅尔频率倒谱系数(IMFCC)为特征参数的特征提取新方法。该方法利用Fisher准则将MFCC和IMFCC相结合,构造了一种混合特征参数。实验结果表明,新的混合特征参数与MFCC相比,在纯净语音库及噪声环境中均具有较好的识别性能。相似文献

9.

SMFCC:一种新的语音信号特征提取方法

汪海彬余正涛毛存礼郭剑毅《计算机应用》2016,36(6):1735-1740

针对说话人识别系统中存在的有效语音特征提取以及噪声影响的问题,提出了一种新的语音特征提取方法——基于S变换的美尔倒谱系数(SMFCC)。该方法是在传统美尔倒谱系数(MFCC)的基础上利用S变换的二维时频多分辨率特性,以及奇异值分解(SVD)方法的二维时频矩阵有效去噪性,并结合相关统计分析方法最终获得语音特征。采用TIMIT语音数据库,将所提的特征和现有特征进行对比实验。SMFCC特征的等错误率(EER)和最小检测代价(MinDCF)均小于线性预测倒谱系数(LPCC)、MFCC及其结合方法LMFCC,比MFCC的EER和MinDCF08分别下降了3.6%与17.9%。实验结果表明所提方法能够有效去除语音信号中的噪声,提升局部分辨率。相似文献

10.

基于Fisher比的梅尔倒谱系数混合特征提取方法 总被引：1，自引：0，他引：1

鲜晓东樊宇星《计算机应用》2014,34(2):558-561

针对语音识别中梅尔倒谱系数（MFCC）对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数（IMFCC）和中频梅尔倒谱系数（MidMFCC）,并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。相似文献

11.

语音MFCC特征计算的改进算法 总被引：1，自引：0，他引：1

章熙春曹燕张军韦岗《数据采集与处理》2005,20(2):161-165

提出了一种计算Mel频倒谱参数(Mel frequency cepstral coefficient,MFCC)特征的改进算法,该算法采用了加权滤波器分析(Wrapped discrete Fourier transform,WDFT)技术来提高语音信号低频部分的频谱分辨率,使之更符合人类听觉系统的特性。同时还运用了加权滤波器分析(Weighted filter bank analysis,WFBA)技术,以提高MFCC的鲁棒性。对TIMIT连续语音数据库中DR1集的音素识别结果表明,本文提出的改进算法比传统MFCC算法具有更好的识别率。相似文献

12.

基于音频特征的半脆弱水印方案

寇占奎徐江峰《计算机工程与设计》2012,33(9):3323-3326,3341

提出了一个基于语音特征和均值量化的DWT域半脆弱音频数字水印方案.该方案对语音特征进行提取得到梅勒频率倒谱系数(MFCC),对其进行turbo纠错编码,通过均值量化把该编码作为水印嵌入到载体语音中.实验分析结果表明,该方案既对音频的一般操作有较强的鲁棒性,又对篡改音频操作有很强的敏感性,同时水印提取不需要额外的水印信号,准确度高. 相似文献

13.

MFCC特征改进算法在语音识别中的应用 总被引：2，自引：0，他引：2

下载免费PDF全文

俸云景新幸叶懋《计算机工程与科学》2009,31(12)

本文的目的是阐明一种Mel频率倒谱参数特征的改进算法。该算法是通过线性预测的方法从语音信号中提取出残差相位,同时将残差相位与传统的MFCC相结合,并应用到语音识别系统中。该改进算法比传统的MFCC算法具有更好的识别率。相似文献

14.

一种改进的特征提取方法在语音识别中的应用

陈树于海波《传感器与微系统》2018,(5):154-157

针对梅尔频率倒谱系数(MFCC)参数在噪声环境中语音识别率下降的问题,提出了一种基于耳蜗倒谱系数(CFCC)的改进的特征参数提取方法.提取具有听觉特性的CFCC特征参数;运用改进的线性判别分析(LDA)算法对提取出的特征参数进行线性变换,得到更具有区分性的特征参数和满足隐马尔可夫模型(HMM)需要的对角化协方差矩阵;进行均值方差归一化,得到最终的特征参数.实验结果表明:提出的方法能有效地提高噪声环境中语音识别系统的识别率和鲁棒性. 相似文献

15.

Automatic genre classification of Indian Tamil and western music using fractional MFCC

Betsy Rajesh D. G. Bhalke 《International Journal of Speech Technology》2016,19(3):551-563

相似文献

16.

Speech recognition with improved support vector machine using dual classifiers and cross fitness validation

B.?Kanisha Email author S.?Lokesh Priyan?Malarvizhi?Kumar View author&#;s OrcID profile P.?Parthasarathy Gokulnath?Chandra Babu 《Personal and Ubiquitous Computing》2018,22(5-6):1083-1091

In this research, a new speech recognition method based on improved feature extraction and improved support vector machine (ISVM) is developed. A Gaussian filter is used to denoise the input speech signal. The feature extraction method extracts five features such as peak values, Mel frequency cepstral coefficient (MFCC), tri-spectral features, discrete wavelet transform (DWT), and the difference values between the input and the standard signal. Next, these features are scaled using linear identical scaling (LIS) method with the same scaling method and the same scaling factors for each set of features in both training and testing phases. Following this, to accomplish the training process, an ISVM is developed with best fitness validation. The ISVM consists of two stages: (i) linear dual classifier that finds the same class attributes and different class attributes simultaneously and (ii) cross fitness validation (CFV) method to prevent over fitting problem. The proposed speech recognition method offers 98.2% accuracy. 相似文献

17.

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Sankaran Panchapagesan Abeer Alwan 《Computer Speech and Language》2009,23(1):42-64

Vocal tract length normalization (VTLN) for standard filterbank-based Mel frequency cepstral coefficient (MFCC) features is usually implemented by warping the center frequencies of the Mel filterbank, and the warping factor is estimated using the maximum likelihood score (MLS) criterion. A linear transform (LT) equivalent for frequency warping (FW) would enable more efficient MLS estimation. We recently proposed a novel LT to perform FW for VTLN and model adaptation with standard MFCC features. In this paper, we present the mathematical derivation of the LT and give a compact formula to calculate it for any FW function. We also show that our LT is closely related to different LTs previously proposed for FW with cepstral features, and these LTs for FW are all shown to be numerically almost identical for the sine-log all-pass transform (SLAPT) warping functions. Our formula for the transformation matrix is, however, computationally simpler and, unlike other previous LT approaches to VTLN with MFCC features, no modification of the standard MFCC feature extraction scheme is required. In VTLN and speaker adaptive modeling (SAM) experiments with the DARPA resource management (RM1) database, the performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation. This demonstrates that the approximations involved do not lead to any performance degradation. Performance comparable to front end VTLN was also obtained with LT adaptation of HMM means in the back end, combined with mean bias and variance adaptation according to the maximum likelihood linear regression (MLLR) framework. The FW methods performed significantly better than standard MLLR for very limited adaptation data (1 utterance), and were equally effective with unsupervised parameter estimation. We also performed speaker adaptive training (SAT) with feature space LT denoted CLTFW. Global CLTFW SAT gave results comparable to SAM and VTLN. By estimating multiple CLTFW transforms using a regression tree, and including an additive bias, we obtained significantly improved results compared to VTLN, with increasing adaptation data. 相似文献

18.

An adaptive speech endpoint detection method in low SNR environments

Linhui Sun Min Su Zhenzhen Yang 《International Journal of Speech Technology》2017,20(3):651-658

Endpoint detection of speech has been shown prosperous for speech recognition and speech enhancement. But the traditional endpoint detection methods lose efficiency in either low signal-to-noise ratio (SNR) environments or nonstationary noise environments. To improve the accuracy of speech endpoint detection in low SNR environments, an endpoint detection method based on an adaptive algorithm for thresholds adjustment is put forward in this paper. The spectral subtraction of multitaper spectrum estimation is performed to enhance the speech. During the process of detection, the cepstral distance of Mel frequency cepstrum coefficient (MFCC) is utilized and the thresholds are adaptively adjusted to different environments. Simulation experiments indicate that in different noise environments with different SNRs, our algorithm has a better endpoint detection accuracy compared with other detection algorithms. Besides that, the algorithm also exhibits strong robustness in low SNR environments. 相似文献

19.

基于MFCC的频谱重构实现音高估计和发声分类

张少华秦会斌《测控技术》2019,38(11):86-89

音高估计和发声分类可以帮助快速检索目标语音,是语音检索中十分重要且困难的研究方向之一,对语音识别领域具有重要的意义。提出了一种新型音高估计和发声分类方法。利用梅尔频率倒谱系数（MFCC）进行频谱重构,并在对数下对重构的频谱进行压缩和过滤。通过高斯混合模型（GMM）对音高频率和滤波频率的联合密度建模来实现音高估计,实验结果在TIMIT数据库上的相对误差为6.62%。基于高斯混合模型的模型也可以完成发声分类任务,经试验测试表明发声分类的准确率超过99%,为音高估计和发声分类提供了一种新的模型。相似文献

20.

Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition

Jeih-Weih Hung Wei-Yi Tsai 《IEEE transactions on audio, speech, and language processing》2008,16(3):563-577

Data-driven temporal filtering approaches based on a specific optimization technique have been shown to be capable of enhancing the discrimination and robustness of speech features in speech recognition. The filters in these approaches are often obtained with the statistics of the features in the temporal domain. In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. Three new temporal filtering approaches are proposed and based on constrained versions of linear discriminant analysis (LDA), principal component analysis (PCA), and minimum class distance (MCD), respectively. It is shown that these proposed temporal filters can effectively improve the speech recognition accuracy in various noise-corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provide average relative error reduction rates of over 40% and 27% when compared with baseline Mel frequency cepstral coefficient (MFCC) processing and CMVN alone, respectively. 相似文献