期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Nonlinear cepstral equalisation method for noisy speech recognition

Lee L.-M. Chen J.-K. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(6):397-402

The authors deal with the problem of automatic speech recognition in the presence of additive white noise. The effect of noise is modelled as an additive term to the power spectrum of the original clean speech. The cepstral coefficients of the noisy speech are then derived from this model. The reference cepstral vectors trained from clean speech are adapted to their appropriate noisy version to best fit the testing speech cepstral vector. The LPC coefficients, LPC derived cepstral coefficients, and the distance between test and reference, are all regarded as functions of the noise ratio (the spectral power ratio of noise to noisy speech). A gradient based algorithm is proposed to find the optimal noise ratio as well as the minimum distance between the test cepstral vector and the noise adapted reference. A recursive algorithm based on Levinson-Durbin recursion is proposed to simultaneously calculate the LPC coefficients and the derivatives of the LPC coefficients with respect to the noise ratio. The stability of the proposed adaptation algorithm is also addressed. Experiments on multispeaker (50 males and 50 females) isolated Mandarin digits recognition demonstrate remarkable performance improvements over noncompensated method under noisy environment. The results are also compared to the projection based approach, and experiments show that the proposed method is superior to the projection approach under a severe noisy environment 相似文献

2.

Cepstrum third-order normalisation method for noisy speechrecognition

Yong Ho Suk Seung Ho Choi Hwang Soo Lee 《Electronics letters》1999,35(7):527-528

A new cepstrum normalisation method is proposed which can be used to compensate for distortion caused by additive noise. Conventional methods only compensate for the deviation of the cepstral mean and/or variance. However, deviations of higher order moments also exist in noisy speech signals. The proposed method normalises the cepstrum up to its third-order moment, providing closer probability density functions between clean and noisy cepstra than is possible using conventional methods. From the speaker-independent isolated-word recognition experiments, it is shown that the proposed method gives improved performance compared with that of conventional methods, especially in heavy noise environments 相似文献

3.

Filtering of Filter‐Bank Energies for Robust Speech Recognition

Ho‐Young Jung 《ETRI Journal》2004,26(3):273-276

We propose a novel feature processing technique which can provide a cepstral liftering effect in the log‐spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance‐based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log‐spectral domain corresponding to the cepstral liftering. The proposed method performs a high‐pass filtering based on the decorrelation of filter‐bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature. 相似文献

4.

Smoothing approach using forward–backward Kalman filter with Markov switching parameters for speech enhancement

Ki Yong Lee Souhwan Jung JaeYeal Rheem 《Signal processing》2000,80(12):2579

In this paper, a smoothing approach for enhancing speech signals degraded by statistically independent additive nonstationary noise is developed. The autoregressive hidden Markov model (ARHMM) is used for modeling the statistical characteristics of both the clean speech signal and the nonstationary noise process. In this case, the speech enhancement comprises a weighted sum of the conditional mean estimators for the composite states of the models for the speech and noise, where the weights are equal to the posterior probabilities of the composite states, given the noisy speech. The conditional mean estimators use a smoothing approach based on two Kalman filters with Markovian switching coefficients, where one of the filters propagates in the forward-time direction and the other propagates in the backward-time direction with one frame. The proposed method is tested on speech signals degraded by Gaussian colored noise or nonstationary noise at various input signal-to-noise ratios. An approximate improvement of 4.7–5.2 dB in SNR is achieved at input SNR 10 and 15 dB. Also, in comparison with conventional method (Ephraim, IEEE Trans. Signal Process. SP-41 (April 1992) 725–735), our proposed method shows improvement of about 0.3 dB in SNR. 相似文献

5.

Representation of hidden Markov model for noise adaptive speechrecognition

Lee L.-M. Wang H.-C. 《Electronics letters》1995,31(8):616-617

The state parameters of the hidden Markov model are represented by the autocorrelation coefficients of a context window that can be adaptively transformed to cepstral and delta cepstral coefficients according to the environmental noise. Experimental results show that it can significantly improve the speech recognition rate under noisy environments 相似文献

6.

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Yasser Shekofteh Farshad Almasganj 《ETRI Journal》2013,35(1):100-108

In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral‐based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior‐probability‐based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well‐known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel‐frequency cepstral coefficient FE method. 相似文献

7.

Frame-synchronous noise compensation for hands-free speechrecognition in car environments

Chien J.-T. Lin M.-S. 《Vision, Image and Signal Processing, IEE Proceedings -》2000,147(6):508-515

It has become increasingly important to develop hands-free speech recognition techniques for the human-computer interface in car environments. However, severe car noise degrades the speech recognition performance substantially. To compensate the performance loss, it is necessary to adapt the original speech hidden Markov models (HMMs) to meet changing car environments. A novel frame-synchronous adaptation mechanism for in-car speech recognition is presented. This mechanism is intended to perform unsupervised model adaptation efficiently on a frame-by-frame basis instead of a conventional adaptation algorithm relying on batch adaptation data and supervision information. The proposed adaptation scheme is performed during frame likelihood calculation where an optimal equalisation factor is first computed to equalise the model mean vector and the input frame vector. This equalisation factor then serves as a reference index to retrieve an additional bias vector for model mean adaptation. As a result, a rapid and flexible algorithm is exploited to establish a new robust likelihood measure. In experiments on hands-free in-car speech recognition with the microphone far from the talker, this framework is found to be effective in terms of recognition rate and computational cost under various driving speeds 相似文献

8.

Isolated Mandarin syllable recognition using segmental features

Chang S. Chen S.-H. 《Vision, Image and Signal Processing, IEE Proceedings -》1995,142(1):59-64

A segment-based speech recognition scheme is proposed. The basic idea is to model explicitly the correlation among successive frames of speech signals by using features representing contours of spectral parameters. The speech signal of an utterance is regarded as a template formed by directly concatenating a sequence of acoustic segments. Each constituent acoustic segment is of variable length in nature and represented by a fixed dimensional feature vector formed by coefficients of discrete orthonormal polynomial expansions for approximating its spectral parameter contours. In the training, an automatic algorithm is proposed to generate several segment-based reference templates for each syllable class. In the testing, a frame-based dynamic programming procedure is employed to calculate the matching score of comparing the test utterance with each reference template. Performance of the proposed scheme was examined by simulations on multi-speaker speech recognition for 408 highly confusing isolated Mandarin base-syllables. A recognition rate of 81.1% was achieved for the case using 5-segment, 8-reference template models with cepstral and delta-cepstral coefficients as the recognition features. It is 4.5% higher than that of a well-modelled 12-state, 5-mixture CHMM method using cepstral, delta cepstral, and delta-delta cepstral coefficients 相似文献

9.

Comparison of some noise-compensation methods for speechrecognition in adverse environments

Milner B.P. Vaseghi S.V. 《Vision, Image and Signal Processing, IEE Proceedings -》1994,141(5):280-288

A comparative study is presented of three noise-compensation schemes, namely spectral subtraction, Wiener filters, and noise adaptation, for hidden-Markov-model-based speech recognition in adverse environments. The noise-compensation methods are evaluated on a spoken-digit database, in the presence of car noise and helicopter noise at different signal-to-noise ratios. Experimental results demonstrate that the noise-compensation methods achieve a substantial improvement in recognition accuracy across a wide range of signal-to-noise ratios. At a signal-to-noise ratio of -6 dB the recognition accuracy is improved from 11% to 83%. The use of cepstral-time matrices as an improved speech representation is also considered, and their combination with the noise-compensation methods is shown. Experiments show that the cepstral-time matrix is a more robust feature than a vector of identical size, composed of a combination of cepstral and differential cepstral features 相似文献

10.

Homomorphic vector quantisation

Vích R. 《Electronics letters》1987,23(11):561-562

A new approach is proposed for vector quantisation in linear predictive speech coding. The problem is formulated as speech model recognition by minimising the Euclidean distance measure of real cepstra of models with unit power transmission. The procedure is robust with respect to quantisation of both the cepstral coefficients and operational results. 相似文献

11.

基于HFM和感知后滤波器的语音增强

张伟伟冯大政《电子科技》2007,(9):1-5

研究了只能获得带噪信号的情况下的语音增强问题。将语音信号看作由高斯噪声激励的自回归(AR)过程,观测噪声为加性高斯白噪声,把信号转化为状态空间模型。首先用隐马尔可夫模型(HMM)估计AR参数和噪声的方差作为卡尔曼滤波器初值,估计信号作为参数估计的中间值给出,然后将估计信号通过一个感知滤波器平滑以消除残余噪声。仿真结果表明该算法有良好的性能。相似文献

12.

Adaptation scheme for hidden Markov models in noisy speechrecognition

Tai-Hwei Hwang Hsiao-Chuan Wang 《Electronics letters》1997,33(4):257-258

Shrinkage of the mean vectors and the variances in HMM due to additive white noise is an important issue for the speech recogniser. By giving an assumed relation between the adaptation factors for mean vector and variances, an optimal adaptation factor can be found by using the maximum likelihood method 相似文献

13.

基于GARCH模型的改进的倒谱域特征参数补偿算法

简志华杨震《信号处理》2007,23(3):383-387

本文提出了一种改进的倒谱域特征参数补偿算法GMCSM。根据语音信号的时变特性,GMCSM算法使用广义自回归条件异方差(Generalized Auto-Regressive Conditional Heteroscedasticity,GARCH)模型对语音信号的方差进行建模。实验数据表明,与常规倒谱相减法CSM和MEMCSM相比,GMCSM能够更有效地补偿因加性噪声引起的倒谱特征参数失真,减少识别的错误率,特别是在信噪比较低的情况下,GMCSM的性能更为显著。相似文献

14.

Spectral magnitude normalisation and cepstral coefficient transformfor noisy-Lombard speech recognition

Sang-Mun Chi Yung-Hwan Oh 《Electronics letters》1996,32(19):1761-1763

The authors propose a degradation model which represents the spectral changes of speech signals by the Lombard effect and noise contamination in noisy environments. According to this model, spectral magnitude normalisation and cepstral coefficient transforms are used to restore the cepstrum of clean speech from noisy-Lombard speech 相似文献

15.

Robust speech features based on wavelet transform with application to speaker identification 总被引：2，自引：0，他引：2

Hsieh C.-T. Lai E. Wang Y.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》2002,149(2):108-114

An effective and robust speech feature extraction method is presented. Based on the time-frequency multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of an individual speaker, the linear predictive cepstral coefficients of the approximation channel and entropy value of the detail channel for each decomposition process are calculated. In addition, an adaptive thresholding technique for each lower resolution is also applied to remove the influence of noise interference. Experimental results show that using this mechanism not only effectively reduces the influence of noise interference but also improves the recognition performance. Finally, the proposed method is evaluated on the MAT telephone speech database for text-independent speaker identification using the group vector quantisation identifier. Some popular existing methods are also evaluated for comparison, and the results show that the proposed feature extraction algorithm is more effective and robust than the other existing methods. In addition, the performance of the proposed method is very satisfactory even in a low SNR environment corrupted by Gaussian white noise. 相似文献

16.

基于隐马尔可夫模型局部最优状态路径的数据重建算法 总被引：3，自引：1，他引：2

罗宇杜利民《电子与信息学报》2004,26(5):722-726

该文提出了基于隐马尔可夫模型局部最优状态路径的数据重建(LOPDI)算法。该算法假设语音特征矢量是一个L状态隐马尔可夫模型的输出序列,基于局部最优状态路径估计产生语音特征矢量的次最优状态序列,并按最大后验概率准则(MAP)重建出缺失矢量。实验表明,LOPDI算法能够显著提高语音识别系统对加性噪声的鲁棒性。相似文献

17.

Downlink Precoding for Multiple Users in FDD Massive MIMO Without CSI Feedback

Ming-Fu Tang Borching Su 《Journal of Signal Processing Systems》2016,82(2):151-161

In this paper, we propose a robust distant-talking speech recognition by combining cepstral domain denoising autoencoder (DAE) and temporal structure normalization (TSN) filter. As DAE has a deep structure and nonlinear processing steps, it is flexible enough to model highly nonlinear mapping between input and output space. In this paper, we train a DAE to map reverberant and noisy speech features to the underlying clean speech features in the cepstral domain. For the proposed method, after applying a DAE in the cepstral domain of speech to suppress reverberation, we apply a post-processing technology based on temporal structure normalization (TSN) filter to reduce the noise and reverberation effects by normalizing the modulation spectra to reference spectra of clean speech. The proposed method was evaluated using speech in simulated and real reverberant environments. By combining a cepstral-domain DAE and TSN, the average Word Error Rate (WER) was reduced from 25.2 % of the baseline system to 21.2 % in simulated environments and from 47.5 % to 41.3 % in real environments, respectively. 相似文献

18.

噪声自适应的多数据流复合子带语音识别方法 总被引：3，自引：0，他引：3

张军韦岗《电子与信息学报》2006,28(7):1183-1187

首先针对现有丢失数据语音识别技术中的边缘化(marginalisation)技术在特征运用上的局限,提出了一种倒谱特征分量的可靠性估计方法,将边缘化技术推广到常用的倒谱语音识别系统中; 然后利用基于全带和子带倒谱特征的边缘化识别器在不同噪声中的互补性能,提出了一种噪声自适应的多数据流复合子带语音识别方法。实验结果表明,所提识别方法可以自适应地选出全带和子带数据流中受噪声影响较小者并以之为主要依据进行识别,有效地提高了识别系统在多变噪声环境中的鲁棒性。相似文献

19.

基于自适应共轭梯度参数估计的顽健卡尔曼滤波语音增强算法

董婧赵晓晖《通信学报》2004,25(8):44-51

对于加性噪声影响下的语音信号，利用双通道输入建立起来的增广卡尔曼滤波器模型，采用自适应共轭梯度方法对纯净语音和有色噪声干扰模型分别进行参数估计，提出了一种有效的语音增强算法。由于该方法对模型参数的估计精确性较高，而且估计速度快，同卡尔曼滤波类的其它语音增强方法相比，其语音增强效果良好，且具有一定的顽健性。仿真实验表明在环境噪声很复杂的情况下，该方法仍然有效。相似文献

20.

Speech analysis by pole-zero decomposition of short-time spectra

B. Yegnanarayana 《Signal processing》1981,3(1):5-17

A new method for representation of speech spectra based on a pole-zero decomposition technique is proposed in this paper. In this method the parameters of a pole-zero model for the smoothed short-time spectrum of speech are determined by adopting a cepstral matching criterion. The cepstral coefficients of the impulse response of the model are equal to the cepstral coefficients of the signal up to a specified number which determine the order of the model system. This is analogous to autocorrelation matching in linear prediction analysis. It is shown that the model spectrum represents both peaks and valleys of the smoothed spectrum equally well, unlike the all pole model of linear prediction analysis where only the peaks are well represented. The pole and zero parameters are derived in an identical manner by approximately deconvolving the pole and zero contributions in the cepstral domain. The residual from the inverse pole-zero system can be used to obtain information about the excitation signal. 相似文献