首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Traditional single-channel subspace-based schemes for speech enhancement rely mostly on linear minimum mean-square error estimators, which are globally optimal only if the Karhunen-Loeacuteve transform (KLT) coefficients of the noise and speech processes are Gaussian distributed. We derive in this paper subspace-based nonlinear estimators assuming that the speech KLT coefficients are distributed according to a generalized super-Gaussian distribution which has as special cases the Laplacian and the two-sided Gamma distribution. As with the traditional linear estimators, the derived estimators are functions of the a priori signal-to-noise ratio (SNR) in the subspaces spanned by the KLT transform vectors. We propose a scheme for estimating these a priori SNRs, which is in fact a generalization of the "decision-directed" approach which is well-known from short-time Fourier transform (STFT)-based enhancement schemes. We show that the proposed a priori SNR estimation scheme leads to a significant reduction of the residual noise level, a conclusion which is confirmed in extensive objective speech quality evaluations as well as subjective tests. We also show that the derived estimators based on the super-Gaussian KLT coefficient distribution lead to improvements for different noise sources and levels as compared to when a Gaussian assumption is imposed  相似文献   

2.
In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time–frequency domain using short time Fourier transform (STFT). Time–frequency ratio’s are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.  相似文献   

3.
In this paper, we propose a convolutive transfer function generalized sidelobe canceler (CTF-GSC), which is an adaptive beamformer designed for multichannel speech enhancement in reverberant environments. Using a complete system representation in the short-time Fourier transform (STFT) domain, we formulate a constrained minimization problem of total output noise power subject to the constraint that the signal component of the output is the desired signal, up to some prespecified filter. Then, we employ the general sidelobe canceler (GSC) structure to transform the problem into an equivalent unconstrained form by decoupling the constraint and the minimization. The CTF-GSC is obtained by applying a convolutive transfer function (CTF) approximation on the GSC scheme, which is a more accurate and a less restrictive than a multiplicative transfer function (MTF) approximation. Experimental results demonstrate that the proposed beamformer outperforms the transfer function GSC (TF-GSC) in reverberant environments and achieves both improved noise reduction and reduced speech distortion.   相似文献   

4.
Noise reduction for speech applications is often formulated as a digital filtering problem, where the clean speech estimate is obtained by passing the noisy speech through a linear filter/transform. With such a formulation, the core issue of noise reduction becomes how to design an optimal filter (based on the statistics of the speech and noise signals) that can significantly suppress noise without introducing perceptually noticeable speech distortion. The optimal filters can be designed either in the time or in a transform domain. The advantage of working in a transform space is that, if the transform is selected properly, the speech and noise signals may be better separated in that space, thereby enabling better filter estimation and noise reduction performance. Although many different transforms exist, most efforts in the field of noise reduction have been focused only on the Fourier and Karhunen–LoÈve transforms. Even with these two, no formal study has been carried out to investigate which transform can outperform the other. In this paper, we reformulate the noise reduction problem into a more generalized transform domain. We will show some of the advantages of working in this generalized domain, such as 1) different transforms can be used to replace each other without any requirement to change the algorithm (optimal filter) formulation, and 2) it is easier to fairly compare different transforms for their noise reduction performance. We will also address how to design different optimal and suboptimal filters in such a generalized transform domain.   相似文献   

5.
The minimum variance distortionless response (MVDR) beamformer, also known as Capon's beamformer, is widely studied in the area of speech enhancement. The MVDR beamformer can be used for both speech dereverberation and noise reduction. This paper provides new insights into the MVDR beamformer. Specifically, the local and global behavior of the MVDR beamformer is analyzed and novel forms of the MVDR filter are derived and discussed. In earlier works it was observed that there is a tradeoff between the amount of speech dereverberation and noise reduction when the MVDR beamformer is used. Here, the tradeoff between speech dereverberation and noise reduction is analyzed thoroughly. The local and global behavior, as well as the tradeoff, is analyzed for different noise fields such as, for example, a mixture of coherent and non-coherent noise fields, entirely non-coherent noise fields and diffuse noise fields. It is shown that maximum noise reduction is achieved when the MVDR beamformer is used for noise reduction only. The amount of noise reduction that is sacrificed when complete dereverberation is required depends on the direct-to-reverberation ratio of the acoustic impulse response between the source and the reference microphone. The performance evaluation supports the theoretical analysis and demonstrates the tradeoff between speech dereverberation and noise reduction. When desiring both speech dereverberation and noise reduction, the results also demonstrate that the amount of noise reduction that is sacrificed decreases when the number of microphones increases.   相似文献   

6.
基于分段线性预测算法估计语音的共振峰频率,运用多通道的滤波器组对语音的频段进行划分,然后选择合适的逆滤波器逼近不同频段的短时频谱,最后依据该逆滤波器估计共振峰频率。实验结果表明,与传统方法相比,该方法提高了语音共振峰频率估计时的分辨率与准确性,受噪声的影响较小。  相似文献   

7.
本文讨论了最小方差无失真响应建模方法,并与线性预测方法进行了比较,比较发现最小方差无失真响应滤波器能提供一个更好的原始语音包络。然后在研究ICA原理及FastICA快速算法的基础上,将MVDR参数提取方法与独立分量分析方法相结合,并与传统语音识别方法在有噪声和无噪声的情况下进行了比较,进而对识别率、计算时间等结果进行了分析。MVDR参数提取方法可以提高语音识别系统的识别率,但是会增加平均识别时间;而经过ICA特征变换后的语音识别系统具有较好的鲁棒性。  相似文献   

8.
针对语音卷积盲源分离频域法排列顺序不确定性问题,提出一种多频段能量排序算法。首先,通过对混合信号的短时傅立叶变换(STFT),在频域上各个频点建立一个瞬时混合模型进行独立分量分析,之后结合能量相关排序法和波达方向(DOA)排序法解决排序不确定性问题,再利用分裂语谱方法解决幅度不确定性问题,进而得到每个频点正确的分离子信号,最后利用逆短时傅立叶(ISTFT)变换得到分离的源信号。仿真结果表明,与Murata的排序算法对比,改进的算法在信号偏差比、信道干扰比、系统误差比上都所提高。  相似文献   

9.
基于时频分析方法LPFT的语音处理GUI系统   总被引:1,自引:1,他引:0  
由于语音的非平稳性及时变性,时频分析方法是处理语音信号的重要工具.然而线性短时傅里叶变换的时频聚集性较差,而双线性维格纳变换在处理多分量信号时会受到交叉项的干扰.为了克服以上两种时频分析方法的缺点,利用短时傅里叶变换的扩展形式即局部多项式傅里叶变换LPFT来处理语音信号,建立了基于LPFT的语音处理GUI系统,实现了在时域、频域和时频域对语音的分析和对比.并给出语音处理的例子,验证了LPFT方法与其它方法相比所具有的优势.该系统简明直观,是语音处理的较好的平台.  相似文献   

10.
小波包分解下的多窗谱估计语音增强算法   总被引:1,自引:0,他引:1       下载免费PDF全文
查诚  杨平  潘平 《计算机工程》2012,38(5):291-292
传统谱减法是基于短时傅里叶变换的单一分辨率算法,具有较大方差。为此,提出一种基于小波包分解下的多窗谱估计语音增强算法。将含噪语音在小波包下分解成不同频段,在不同频段下进行多窗谱谱减运算,并逐一进行小波包重构,以得到去噪后的语音信号。仿真结果表明,该算法能提高含噪语音的信噪比,降低语言失真度。  相似文献   

11.
Noise reduction for speech enhancement is a useful technique, but in general it is a challenging problem. While a single-channel algorithm is easy to use in practice, it inevitably introduces speech distortion to the desired speech signal while reducing noise. Today, the explosive growth in computational power and the continuous drop in the cost and size of acoustic electric transducers are driving the interest of employing multiple microphones in speech processing systems. This opens new opportunities for noise reduction. In this paper, we present an analysis of three multichannel noise reduction algorithms, namely Wiener filter, subspace, and spatial-temporal prediction, in a common framework. We intend to investigate whether it is possible for the multichannel noise reduction algorithms to reduce noise without speech distortion. Finally, we justify what we learn via theoretical analyses by simulations using real impulse responses measured in the varechoic chamber at Bell Labs.  相似文献   

12.
This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT magnitude estimator has as a special case the existing scheme based on a Rayleigh speech prior, while the complex DFT estimators generalize existing schemes based on Gaussian, Laplacian, and Gamma speech priors. Extensive simulation experiments with speech signals degraded by various additive noise sources verify that significant improvements are possible with the more recent estimators based on super-Gaussian priors. The increase in perceptual evaluation of speech quality (PESQ) over the noisy signals is about 0.5 points for street noise and about 1 point for white noise, nearly independent of input signal-to-noise ratio (SNR). The assumptions made for deriving the complex DFT estimators are less accurate than those for the magnitude estimators, leading to a higher maximum achievable speech quality with the magnitude estimators.  相似文献   

13.
We propose a new method for adaptively removing noise and interference from a signal. In this method unwanted components are removed from the short time Fourier transform (STFT) surface, and the clean signal is estimated by integrating the modified STFT with respect to frequency. Isolation of the signal and interference components is facilitated by a concentration process based on the phase of the STFT differentiated with respect to time. The concentrated STFT is a linear representation, free of cross terms and having the property that signal and interference components are easily recognized because their distributions are more concentrated in frequency. Interference removal may be accomplished by removing unwanted components from the concentrated STFT, and the clean signal may be estimated by integration of the modified concentrated STFT. We demonstrate the advantages of the proposed method over conventional methods.  相似文献   

14.
This paper addresses the problem of extracting a desired speech source from a multispeaker environment in the presence of background noise. A new adaptive beamforming structure is proposed for this speech enhancement problem. This structure incorporates power spectral density (PSD) estimation of the speech sources together with a noise statistics update. An inactive-source detector based on minimum statistics is developed to detect the speech presence and to track the noise statistics. Performance of the proposed beamformer is investigated and compared to the minimum variance distortionless response (MVDR) beamformer with or without a postfilter in a real hands-free communication environment. Evaluations show that the proposed beamformer offers good interference and noise suppression levels while maintaining low distortion of the desired source.   相似文献   

15.
Despite great developments in the field of acoustic echo cancellation (AEC), the presence of double-talk remains difficult problem. The main role of double-talk detection (DTD) is to control adaptation of the filter coefficients by halting their update in double-talk situations. In this paper, we propose a new method of DTD based on a time–frequency analysis that uses the Stockwell transform (ST).The ST is a time–frequency spectral localization method that combines the characteristics of the short-time Fourier transform and the wavelet transform. This method provides better time–frequency resolution, especially for non-stationary signals. In the experimental tests, the normalized least mean squares (NLMS) algorithm is used to update the filter coefficients along with speech signals taken from the TIMIT database. The obtained results show better performance compared to existing methods in terms of misalignment convergence and speech intelligibility enhancement.  相似文献   

16.
Discusses the problem of single-channel speech enhancement in variable noise-level environment. Commonly used, single-channel subtractive-type speech enhancement algorithms always assume that the background noise level is fixed or slowly varying. In fact, the background noise level may vary quickly. This condition usually results in wrong speech/noise detection and wrong speech enhancement process. In order to solve this problem, we propose a subtractive-type speech enhancement scheme. This new enhancement scheme uses the RTF (refined time-frequency parameter)-based RSONFIN (recurrent self-organizing neural fuzzy inference network) algorithm we developed previously to detect the word boundaries in the condition of variable background noise level. In addition, a new parameter (MiFre) is proposed to estimate the varying background noise level. Based on this parameter, the noise level information used for subtractive-type speech enhancement can be estimated not only during speech pauses, but also during speech segments. This new subtractive-type enhancement scheme has been tested and found to perform well, not only in variable background noise level condition, but also in fixed background noise level condition.  相似文献   

17.
In this paper, a set of features derived by filtering and spectral peak extraction in autocorrelation domain are proposed. We focus on the effect of the additive noise on speech recognition. Assuming that the channel characteristics and additive noises are stationary, these new features improve the robustness of speech recognition in noisy conditions. In this approach, initially, the autocorrelation sequence of a speech signal frame is computed. Filtering of the autocorrelation of speech signal is carried out in the second step, and then, the short-time power spectrum of speech is obtained from the speech signal through the fast Fourier transform. The power spectrum peaks are then calculated by differentiating the power spectrum with respect to frequency. The magnitudes of these peaks are then projected onto the mel-scale and pass the filter bank. Finally, a set of cepstral coefficients are derived from the outputs of the filter bank. The effectiveness of the new features for speech recognition in noisy conditions will be shown in this paper through a number of speech recognition experiments.A task of multi-speaker isolated-word recognition and another one of multi-speaker continuous speech recognition with various artificially added noises such as factory, babble, car and F16 were used in these experiments. Also, a set of experiments were carried out on Aurora 2 task. Experimental results show significant improvements under noisy conditions in comparison to the results obtained using traditional feature extraction methods. We have also reported the results obtained by applying cepstral mean normalization on the methods to get robust features against both additive noise and channel distortion.  相似文献   

18.
We describe an architecture that gives a robot the capability to recognize speech by cancelling ego noise, even while the robot is moving. The system consists of three blocks: (1) a multi-channel noise reduction block, comprising consequent stages of microphone-array-based sound localization, geometric source separation and post-filtering; (2) a single-channel noise reduction block utilizing template subtraction; and (3) an automatic speech recognition block. In this work, we specifically investigate a missing feature theory-based automatic speech recognition (MFT-ASR) approach in block (3). This approach makes use of spectro-temporal elements derived from (1) and (2) to measure the reliability of the acoustic features, and generates masks to filter unreliable acoustic features. We then evaluated this system on a robot using word correct rates. Furthermore, we present a detailed analysis of recognition accuracy to determine optimal parameters. Implementation of the proposed MFT-ASR approach resulted in significantly higher recognition performance than single or multi-channel noise reduction methods.  相似文献   

19.
The S-transform (ST) is a popular linear time-frequency (TF) transform with hybrid characteristics from the short-time Fourier transform (STFT) and the wavelet transform. It enables a multi-resolution TF analysis and returns globally referenced local phase information, but its expensive computational requirements often overshadow its other desirable features. In this paper, we develop a fully discrete ST (DST) with a controllable TF sampling scheme based on a filter-bank interpretation. The presented DST splits the analyzed signal into subband channels whose bandwidths increase progressively in a fully controllable manner, providing a frequency resolution that can be varied and made as high as required, which is a desirable property for processing oscillatory signals lacked by previously presented DSTs. Thanks to its flexible sampling scheme, the behavior of the developed transform in the TF domain can be adjusted easily; with specific parameter settings, for example, it samples the TF domain dyadically, while by choosing different settings, it may act as a STFT. The spectral partitioning is performed through asymmetric raised-cosine windows whose collective amplitude is unitary over the signal spectrum to ensure that the transform is easily and exactly invertible. The proposed DST retains all the appealing properties of the original ST, representing a local image of the Fourier transform; it requires low computational complexity and returns a modest number of TF coefficients. To confirm its effectiveness, the developed transform is utilized for different applications using real-world and synthetic signals.  相似文献   

20.
一种基于MVDR和CCBC的抗噪语音识别方法   总被引:1,自引:0,他引:1  
提出了一种适用于抗噪声语音识别的方法,其特征提取过程基于最小方差无失真响应(Minimum variance distortionles sresponse,MVDR)谱估计方法,并对该特征进行频率弯折以提高其知觉分辨率,最后使用基于正则相关分析的谱变换补偿(Canonical correlation based on compensation,CCBC)法对该特征进行自适应处理,从而提高了系统的鲁棒性。在展览馆噪声、人群噪声和汽车噪声下,与基于传统Mel倒谱系数(MFCC)特征的系统进行了对比实验,结果表明使用本文方法的语音识别系统的识别率得到了显著的提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号