首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
基于LPC美尔倒谱特征的带噪语音端点检测   总被引:2,自引:0,他引:2  
复杂的噪声环境是语音识别系统在实际应用中性能下降的原因之一,识别预处理中的带噪端点检测作为关键技术,其性能的优劣某种程度上决定了识别率的高低。笔者提出了基于LPC美尔倒谱特征的带噪端点检测方法,对语音信号分高低频段分别提取IPC美尔倒谱特征分析,根据Mel倒谱距离判决,采用自适应噪声估计,实验结果表明,该方法计算效率较高,低信噪比下有较好的检测性能。  相似文献   

2.
噪声下差分复合子带语音识别方法   总被引:4,自引:0,他引:4  
蒋文建  韦岗 《通信学报》2002,23(1):18-24
本文根据子带特征反映语音信号局部特性和全带特征反映语音信号整体特性的事实,提出了 一种差分复合子带语音识别新方法。先用频谱差分减少噪声的干扰,再将多子带特征识别概率与全带特征识别概率相结合进行综合判决,以得到最终识别结果。将新方法应用于TIMIT数据包0-9十个英文数字和E-Set在NoiseX92的白噪声和F16战机噪声下的识别实验。实验结果表明新方法比传统方法识别性能有很大提高。  相似文献   

3.
蒋文建  韦岗 《电子学报》2001,29(Z1):1829-1832
本文根据多时间尺度分析与子带方法,提出了一种多时间尺度复合子带的噪声环境下语音识别新方法.新方法在不同的时间尺度下分别进行子带特征提取和全带特征提取,并分别进行识别,然后在识别概率层相结合得到最终识别结果.本方法兼有多时间尺度方法和子带方法的抗噪性能.此外,进一步引入频谱差分方法提高语音特征的抗噪性能.对E-SET在NoiseX92下白噪声的识别实验表明,新方法具有良好的抗噪性能.  相似文献   

4.
基于鲁棒听觉特征的说话人识别   总被引:3,自引:0,他引:3  
林琳  陈虹  陈建 《电子学报》2013,41(3):619-624
 为了提高噪声环境中说话人识别系统的性能,本文提出了一种鲁棒听觉特征提取的算法,并将其应用到说话人识别系统中.运用自适应压缩Gammachirp滤波器组模拟人耳耳蜗的听觉特性,对输入的语音信号进行频域子带滤波,将得到的对数子带能量作为听觉特征参数.分别运用离散余弦变换和核主成分分析方法,对提取的特征参数进行特征变换,降低特征参数的维数,提高特征参数的噪声鲁棒性和个性表现力.实验结果表明,将提取的新听觉特征参数应用到说话人识别系统中,新特征参数在鲁棒性和识别性能上均优于梅尔倒谱系数和基于Gammatone的听觉特征参数.  相似文献   

5.
应用于语音识别片上系统的语音检测算法   总被引:2,自引:0,他引:2  
语音识别技术的研究已经进入实用化阶段,而实用化语音识别系统中的一个关键技术就是可靠的语音检测。本文提出了一种基于有限状态机模型的实时语音检测算法(FSM-SD)。采用对数最大似然判决帧能量检测器和过零率检测器控制各状态之间的跳转关系。针对语音识别中的MFCC(Mel频标倒谱系数)和LPCC(线性预测倒谱参数)特征提取过程,分别得到两种不同的帧能量计算方法。将FSM-SD应用到在OAK DSP上实现的小词表汉语语音识别系统,通过实验验证了其对系统识别性能和噪声稳健性的有效保证。  相似文献   

6.
基于多频带谱减法的抗噪声语音识别研究   总被引:1,自引:0,他引:1  
为了减少在噪声环境下测试条件与训练条件不匹配导致的语音识别性能下降,提出了一种结合多频带谱减法的抗噪声语音识别系统。首先提取带噪语音的前几帧作为估计的噪声信号,将带噪语音、估计的噪声信号按频率划分M个互不相交的频带,然后根据每个频带内带噪语音与估计的噪声信号的性噪比,来确定该频带噪声的谱减参数。语音增强作为前端处理,与语音识别器级连构成抗噪声语音识别系统。通过实验仿真表明,基于多频带谱减法的抗噪声语音识别系统在不同信噪比不同类型的噪声下,识别性能明显优于基本谱减法。  相似文献   

7.
基于倒谱特征的带噪语音端点检测   总被引:44,自引:0,他引:44       下载免费PDF全文
胡光锐  韦晓东 《电子学报》2000,28(10):95-97
在语音识别系统中产生错误识别的原因之一是端点检测有误差.在高信噪比情况下,正确地确定语音的端点并不困难.然而,大多数实际的语音识别系统需工作在低信噪比情况下,一些常规的端点检测方法,例如基于能量的端点检测方法在噪声环境下不能有效地工作.本文利用倒谱特征来检测语音端点,提出了带噪语音端点检测的两个算法,第一个算法利用倒谱距离代替短时能量作为判决的门限,第二个算法改进了基于隐马尔柯夫模型(HMM)的语音检测以适应噪声的变化,实验结果表明本方法可得到高正确率的带噪语音端点检测.  相似文献   

8.
提出一种基于GSC的语音增强算法,该算法应用了DFT调制子带滤波器组将语音信号分解到子带进行自适应滤波,从而获得更好的增强效果以及更低的运量复杂度.同时,将范数约束自适应滤波(NCAF)算法应用于自适应噪声对消器(ANC)以降低语音的失真度.为了进一步去除增强后语音中的残留噪声,算法使用改进的Wiener后置滤波器.仿真结果表明,相对于基于全带GSC的麦克风阵列语音增强算法以及传统Wiener后置滤波算法,采用本文所用算法具有更高的输出分段信噪比.  相似文献   

9.
针对语音识别实际应用过程中的噪声问题,给出了一种新的抗噪声的特征提取算法,即先利用小波变换将语音信号进行小波子带分解,再根据人耳的听觉掩蔽效应,由谱压缩的技术,将小波变换后的子带语音信号进行压缩,从而提取其对应的语音特征。通过MATLAB软件建立实验平台,仿真实验结果表明该语音特征可以在噪声环境下得到较高的识别率。新的特征参数即充分利用了小波的抗噪声特性又有效地降低了语音识别中的训练环境和识别环境间的失配,具有抗噪声的特点。  相似文献   

10.
本文在丢失数据技术与声学后退技术的基础上,提出了一种基于模糊规则的鲁棒语音识别方法,首先根据先验知识或假定建立特征分量的可靠程度与其概率分布之间的模糊规则,识别时观察矢量的输出概率由一个基于规则的模糊逻辑系统来得到,并针对倒谱识别系统给出了一种具体的实现方法.实验结果表明,所提识别方法的性能显著优于丢失数据技术和声学后退技术.  相似文献   

11.
Automatic speech recognition under adverse noise conditions has been a challenging problem. Under noise conditions when the stationarity assumption is valid, effective techniques have been established to provide excellent recognition accuracies. Under the conditions when this assumption cannot hold, recognition performance de- clines rapidly. Missing data, MD, theory is a promising method for robust automatic speech recognition, ASR, under an y noise condition. Unfortunately, the choice of feature used in the recognizer process is commonly limited to spectral based representations. The combination of recognizers approach to MD ASR allows the use of cepstral based features within the MD framework through a fusion of features mechanism in the pat- tern recognition stage. It was found that under two types of non-stationary noise conditions the combined fused effect, experienced by the fusion process, increased recognition accuracies substantially over traditional MD and cepstral based recognizers.  相似文献   

12.
Wavelet transform has been found to be an effective tool for the time-frequency analysis of non-stationary and quasi-stationary signals. Recent years have seen wavelet transform being used for feature extraction in speech recognition applications. In the paper a sub-band feature extraction technique based on an admissible wavelet transform is proposed and the features are modified to make them robust to additive white Gaussian noise. The performance of this system is compared with the conventional mel frequency cepstral coefficients (MFCC) under various signal to noise ratios. The recognition performance based on the eight sub-band features is found to be superior under the noisy conditions compared with MFCC features.  相似文献   

13.
Wireless Personal Communications - In this paper, we propose novel sub-band spectral centroid weighted wavelet packet cepstral coefficients (W-WPCC) for robust speech emotion recognition. Wavelet...  相似文献   

14.
The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment  相似文献   

15.
In this paper, we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora. These four features include linear predictive coding (LPC), linear prediction cepstrum coefficient (LPCC), perceptual linear prediction (PLP), and Mel frequency cepstral coefficient (MFCC). The 10-hour speech data were used for training and 3-hour data for testing. For each spectral feature, different hidden Markov model (HMM) based recognizers with variations in HMM states and different Gaussian mixture models (GMMs) were built. The performance was evaluated by using the word error rate (WER). The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features.  相似文献   

16.
It is noted that of great importance to the success of the articulatory approach to speech coding is the use of a good distortion measure between a given speech signal and the entries in a stored codebook of impulse responses and corresponding vocal-track shapes (articulatory codebook). One promising distortion measure is the weighted cepstral distortion. Since the impulse responses in the articulatory codebook do not include glottal characteristics, the authors derive optimal weighting functions (cepstral lifters) to reduce the influence of a varying glottal source on the cepstral distortion measure. This is done by examining the ensemble of cepstral coefficients of speech produced by an articulatory speech synthesizer that also includes a vocal-cord model. The obtained cepstral lifters are optimal for the given ensemble of cepstral coefficients and for given constraints on the weighting function. They are different for cepstral coefficients derived from the power spectrum (FFT cepstra) and for those derived from LPC (linear predictive coding) coefficients (LPC cepstra). The performances of the obtained cepstral lifters are compared in an articulatory codebook search  相似文献   

17.
Vích  R. 《Electronics letters》1987,23(11):561-562
A new approach is proposed for vector quantisation in linear predictive speech coding. The problem is formulated as speech model recognition by minimising the Euclidean distance measure of real cepstra of models with unit power transmission. The procedure is robust with respect to quantisation of both the cepstral coefficients and operational results.  相似文献   

18.
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.  相似文献   

19.
A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients  相似文献   

20.
The speech cepstral coefficients affected by additive noise are investigated. The cepstral vector changes as the level of additive noise increases. The behaviour of cepstral vector change shows that the cepstral vector shrinks in its norm and converges to the cepstral vector of the noise. This nonlinear behaviour of the cepstral vector can be approximated by a simple linear expression. Based on this representation, a model adaptation method is developed using deviation vectors. For every model state mean, a deviation vector is calculated according to the extracted noise spectrum and a pre-defined noise-to-signal ratio. During the pattern matching, an optimal scaling factor for the deviation vector is determined frame by frame, and the scaled deviation vector is added to the state mean of speech models so that the clean speech models are adapted to the noisy environment. Experimental results show that the proposed method is effective for white noise and coloured noise. It also outperforms the weighted projection measure method in experiments  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号