期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Noise‐Robust Speaker Recognition Using Subband Likelihoods and Reliable‐Feature Selection

Sungtak Kim Mikyong Ji Hoirin Kim 《ETRI Journal》2008,30(1):89-100

We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection. 相似文献

2.

Robust speech features based on wavelet transform with application to speaker identification 总被引：2，自引：0，他引：2

Hsieh C.-T. Lai E. Wang Y.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》2002,149(2):108-114

An effective and robust speech feature extraction method is presented. Based on the time-frequency multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise. 相似文献

3.

多分形谱簇研究及其在说话人识别中的应用

下载免费PDF全文

周宇欢张雄伟付强徐鑫王金明《信号处理》2011,27(12):1914-1919

语音是一种复杂的非线性信号,这使得基于线性系统理论发展起来的传统说话人识别技术性能难以进一步提高。本文提出了多分形谱簇分析方法,用于分析语音信号的非线性特征,并应用于短语音（2秒）说话人识别。通过对Cantor集的仿真实验,发现不同标度区能反映出系统不同阶段的生长规律,因此可用一组连续变化的多分形谱分层次地表征系统的分形特性,即多分形谱簇分析方法。然后结合语信号的分形特点,提出一种语音的多分形谱簇特征（Multifractal Spectrum Cluster Feature, MSCF）的提取方法。最后将几种非线性特征与短时谱特征结合用于说话人识别,基于TIMIT数据库50人的实验表明,非线性特征与短时谱特征互补性较强,特别是MSCF与MFCC、LPC特征结合,使得系统的误识率下降到0.8%。相似文献

4.

自适应并行模型组合的鲁棒语音身份识别算法

下载免费PDF全文

李聪葛洪伟《信号处理》2018,34(7):867-875

由于环境噪声的影响,实际应用中说话人识别系统性能会出现急剧下降。提出了一种基于高斯混合模型-通用背景模型和自适应并行模型组合的鲁棒性语音身份识别方法。自适应并行模型组合是一种噪声鲁棒性的特征补偿算法,能够有效减少训练环境与测试环境之间的不匹配现象,从而提高系统识别准确率和抗噪性能。首先,算法从测试语音中估计出噪声特征,然后用一个单高斯模型对噪声特征进行拟合得到噪声均值和协方差。最后,根据得出的噪声均值和协方差,调整训练好的高斯混合模型均值向量和协方差矩阵,使其尽可能地匹配测试环境。实验结果表明,该方法可以准确地重构干净语音的高斯混合模型参数,并且能够显著提高说话人识别的准确率,特别是在低信噪比情况下。相似文献

5.

有效频带多分辨率特征提取及说话人年龄识别

下载免费PDF全文

杜先娜俞一彪《信号处理》2016,32(9):1101-1107

针对文本无关非特定说话人年龄识别,本文提出了一种基于有效频带多分辨率特征的统计分析识别方法。输入语音,通过小波包变换进行有效频带分解,然后将各有效频带的小波包系数连接构成一个整体计算美尔频率倒谱系数,得到有效频带多分辨率特征参数WPMFC（Wavelet Packet Mel-Frequency Cepstrum）,说话人按年龄划分为儿童、青年、中年和老年四个阶段,并进一步按性别训练各年龄段语音得到8个高斯混合模型。测试语音依据最大似然准则进行识别判决。实验对本文提出的方法与传统的短时谱统计分析方法进行了比较,结果显示本文提出的方法有较好的识别性能,集内平均识别率达到65.17%。同时,实验结果也说明相对语音文本变化的影响,不同说话人发音特征的变化对识别性能的影响更大。相似文献

6.

基于加权特征值补偿的说话人识别 总被引：3，自引：0，他引：3

于鹏徐义芳曹志刚《信号处理》2002,18(6):513-517

背景噪声的存在,使得说话人识别系统的训练环境和测试环境发生失配,导致系统性能发生急剧下降。本论文提出一种加权特征值补偿算法,把由噪声引起的使带噪语音信号特征值与纯净语音特征值发生偏差的部分去除,从而使进入识别器的特征值接近纯净语音的特征值。在特征值补偿过程中引入了信噪比加权的方法。实验表明,这种方法能够有效的提高说话人识别系统的性能。相似文献

7.

Lateral inhibition net and weighted matching algorithms for speechrecognition in noise

Yoma N.B. McInnes F. Jack M. 《Vision, Image and Signal Processing, IEE Proceedings -》1996,143(5):324-330

The authors address the problem of speech recognition with signals corrupted by white Gaussian additive noise at moderate SNR. The energy of the noise is not required. A technique based on a lateral inhibition process approximation with a multilayer neural net (the lateral inhibition net (LIN)) and neural net processing efficacy weighting in acoustic pattern matching algorithms is proposed. In the recognition procedure, the local SNR is computed by means of the autocorrelation function and is employed to estimate the efficacy of LIN in noise cancelling which is taken into account as a weight in a pattern matching algorithm. A general criterion based on weighting the frame influence in decisions according to the reliability in noise reduction is suggested, and modified versions of both HMM and DTW algorithms have been designed. To be more coherent with the conditions that define LIN, a modification in the backpropagation algorithm is also proposed 相似文献

8.

基于倒谱特征的带噪语音端点检测 总被引：44，自引：0，他引：44

下载免费PDF全文

胡光锐韦晓东《电子学报》2000,28(10):95-97

在语音识别系统中产生错误识别的原因之一是端点检测有误差.在高信噪比情况下,正确地确定语音的端点并不困难.然而,大多数实际的语音识别系统需工作在低信噪比情况下,一些常规的端点检测方法,例如基于能量的端点检测方法在噪声环境下不能有效地工作.本文利用倒谱特征来检测语音端点,提出了带噪语音端点检测的两个算法,第一个算法利用倒谱距离代替短时能量作为判决的门限,第二个算法改进了基于隐马尔柯夫模型(HMM)的语音检测以适应噪声的变化,实验结果表明本方法可得到高正确率的带噪语音端点检测. 相似文献

9.

含噪语音信号频谱增强技术的统计方法研究与展望

雷广智梁民《数字技术与应用》2012,(1):128-139,142

基于单个麦克风的含噪语音信号频谱增强技术,一直受到有关工业和学术界的高度关注,其广泛应用于诸如语音识别、助听系统和免提终端通信等领域中。本文系统地讨论了含噪语音信号频谱增强系统设计的基本模块元素,并对诸如语音信号估计、语音信号出现概率估计、先验信噪比(SNR)估计和噪声功率谱估计等模块元素的统计技术与方法进行了较详细的讨论和描述。文中还讨论了含噪语音信号频谱增强算法的有关选择问题,并展望了其今后可能的研究与发展方向。相似文献

10.

An immunological approach based on the negative selection algorithm for real noise classification in speech signals

《AEUE-International Journal of Electronics and Communications》2017

This paper presents a new approach to detect and classify background noise in speech sentences based on the negative selection algorithm and dual-tree complex wavelet transform. The energy of the complex wavelet coefficients across five wavelet scales are used as input features. Afterward, the proposed algorithm identifies whether the speech sentence is, or is not, corrupted by noise. In the affirmative case, the system returns the type of the background noise amongst the real noise types considered. Comparisons with classical supervised learning methods are carried out. Simulation results show that the artificial immune system proposed overcomes classical classifiers in accuracy and capacity of generalization. Future applications of this tool will help in the development of new speech enhancement or automatic speech recognition systems based on noise classification. 相似文献

11.

Delta特征用于说话人识别的研究

张凯朱立新金家宝《电声技术》2009,33(4):52-55

多数说话人识别方法采用的都是基于对语音信号的静态特征进行分析,忽略了语音信号动态特征对识别性能的影响。而Delta特征是反映语音信号帧间动态特征的重要特征,对LPC和它的Delta特征进行了具体实例求解,并对计算结果进行了分析,探讨了将它用于说话人识别系统的有效性和可行性。相似文献

12.

STATISTICAL FEATURE OF PITCH FREQUENCY DISTRIBUTIONS FOR OBUST SPEAKER IDENTIFICATION

Zhang Linghua Zheng Baoyu Yang Zhen 《电子科学学刊(英文版)》2005,(4)

This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account. Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 相似文献

13.

加性噪声条件下鲁棒说话人确认 总被引：1，自引：0，他引：1

下载免费PDF全文

张二华王明合唐振民《电子学报》2019,47(6):1244-1250

基于非负矩阵分解的语音去噪,在提高语音信号信噪比的同时,也会引起语音失真,从而导致噪声环境下说话人确认系统性能下降.本文提出基于分区约束非负矩阵分解的语音去噪方法（Nonnegative Matrix Factorization with Partial Constrains,PCNMF）,目的是在未知和非平稳噪声条件下提高话人确认系统的鲁棒性.PCNMF在满足分区约束条件的基础上分别构建语音字典和噪声字典.考虑到传统语音训练产生的语音字典往往含有一定的噪声成分,PCNMF通过数学模型产生基音及泛音频谱,在此基础上利用该频谱模仿人声的共振峰结构来合成字典,从而保证语音字典纯净性.另一方面,为了克服传统噪声字典构建方法带来的部分噪声信息丢失问题,PCNMF对在线分离出的噪声样本进行分帧和短时傅里叶变换,然后以帧为单位线性组合生成噪声字典.性能评估实验引入了多种噪声类型,实验结果表明PCNMF可有效提高说话人确认系统的鲁棒性,特别是在未知和非平稳噪声条件下其等错率相比基线系统（Multi-Condition）平均降低了5.2%. 相似文献

14.

基于分类特征映射的SVM话者确认

贺庆玮李辉许敏强《通信技术》2010,43(3):147-149

为了解决与文中无关的话者确认,大量训练样本数据所导致的建立支持向量机SVM（SupportVectorMachine）话者模型困难,文中提出了一种基于基音分类特征映射和支持向量机的话者确认系统,首先根据基音周期将语音倒谱参数在特征空间上分类,再利用GMM-UBM结构进行特征映射,获得每个特征子空间中的话者特征参数并建立SVM话者模型。基音分类特征映射不仅使得样本数据极大地压缩,而且让子空间中SVM分类界面具有更好的区分性,因此,对各分类子系统评分融合之后的总系统具有更好话者确认性能。在NIST’06数据库上的实验证明了该方法的有效性。相似文献

15.

Impostor Detection in Speaker Recognition Using Confusion‐Based Confidence Measures

Kyuhong Kim Hoirin Kim Minsoo Hahn 《ETRI Journal》2006,28(6):811-814

In this letter, we introduce confusion‐based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model–universal background model (GMM‐UBM) scheme, our confusion‐based measures show better performance in noise‐corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors. 相似文献

16.

Zero-crossing based spectral analysis and SVD spectral analysis forformant frequency estimation in noise

Sreenivas T.V. Niederjohn R.J. 《Signal Processing, IEEE Transactions on》1992,40(2):282-293

The authors discuss a method for spectral analysis of noise corrupted signals using statistical properties of the zero-crossing intervals. It is shown that an initial stage of filter-bank analysis is effective for achieving noise robustness. The technique is compared with currently popular spectral analysis techniques based on singular value decomposition and is found to provide generally better resolution and lower variance at low signal to noise ratios (SNRs). These techniques, along with three established methods and three variations of these method, are further evaluated for their effectiveness for formant frequency estimation of noise corrupted speech. The theoretical results predict and experimental results confirm that the zero-crossing method performs well for estimating low frequencies and hence for first formant frequency estimation in speech at high noise levels (~0 dB SNR). Otherwise, J.A. Cadzow's high performance method (1983) is found to be a close alternative for reliable spectral estimation. As expected the overall performance of all techniques is found to degrade for speech data. The standard autocorrelation-LPC method is found best for clean speech and all methods deteriorate roughly equally in noise 相似文献

17.

具有环境自学习机制的鲁棒说话人识别算法

张靖俞一彪《通信技术》2020,(3):618-624

说话人识别系统实际应用时,一旦应用环境和训练环境不一致,系统的性能会急剧下降。由于环境噪声的多变性,系统训练时无法预测实际应用中的环境噪声。因此,引入环境自学习和自适应思想,通过改进的矢量泰勒级数(Vector Taylor Series,VTS)刻画环境噪声模型和说话人语音模型之间的统计关系,提出一种具有环境自学习能力的鲁棒说话人识别算法。系统应用中每当环境变化时利用语音输入前采集到的环境噪声信号来迭代更新环境噪声模型参数,进一步基于VTS确立的统计关系,将说话人语音模型自适应到实际应用环境来补偿环境失配的影响。说话人辨认实验结果表明,提出的方法在低信噪比条件下对于不同种类的噪声都能显著提升系统的识别性能。相似文献

18.

Statistical feature of pitch frequency distributions for robust speaker identification

ZhangLinghua ZhengBaoyu YangZhen 《电子科学学刊(英文版)》2005,22(4):437-442

This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Prequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account.Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 相似文献

19.

基于DBF的汉语方言自动辨识

韩军《电声技术》2017,41(4)

在汉语方言辨识中,传统的声学特征是语音信号的谱特征的参数化表示,常常包含说话人、信道、背景噪声等冗余信息,针对上述问题将深度神经网络(Deep Neural Network,DNN)引入特征提取之中,提出了与音素层面相关的深度瓶颈特征(Deep Bottleneck Feature,DBF),尝试从特征层面抑制方言冗余信息的影响.最后在实验部分对瓶颈层的位置,节点数目进行了讨论,结果显示,深度瓶颈特征相对于传统声学特征能够取得更高的识别率. 相似文献

20.

一种语音特征参数子分量分析与有效性评价的新方法 总被引：2，自引：0，他引：2

俞一彪许允喜芮贤义《信号处理》2007,23(2):188-191

语音信号中包含语义和说话人个性两大特征,其有效提取和强化对语音识别和说话人识别有着非常重要的意义。本文提出了一种语音特征参数中语义和个性特征子分量分析与有效性评价的4S方法,对语义和个性特征的成份比例进行分析,并通过量化指标评判特征参数对语音识别和说话人识别的有效性。运用4S分析方法对目前常用的特征参数LPC, LPCC和MFCC的子分量分析与有效性评价结果表明,所有的特征参数都更多地包含了语义特征信息,语义特征和说话人个性特征的成份比例因子LIR分别为1．30、1．44和1．61,并且,三种参数对语音识别和说话人识别的有效性均呈现出依次提高的特性。相似文献